MASTERING MATHEMATICAL FINANCE 


Probability for 


Finance 
EKKEHARD KOPP 


JAN MALCZAK 
TOMASZ ZASTAWNIAK 


CAMBRIDGE 


Probability for Finance 


Students and instructors alike will benefit from this rigorous, unfussy text, which 
keeps a clear focus on the basic probabilistic concepts required for an understanding of 
financial market models, including independence and conditioning. Assuming only 
some calculus and linear algebra, the text applies key results of measure and 
integration to probability spaces and random variables, culminating in Central Limit 
Theory. Consequently it provides essential pre-requisites to graduate-level study of 
modern finance and, more generally, to the study of stochastic processes. 

Results are proved carefully and the key concepts are motivated by concrete 
examples drawn from financial market models. Students can test their understanding 
through the large number of exercises that are integral to the text. 


EKKEHARD KOPP is Emeritus Professor of Mathematics at the University of Hull. He 
has published over 50 research papers and five books, on measure and probability, 
stochastic analysis and mathematical finance. He has taught in the UK, Canada and 
South Africa, and he serves on the editorial board of the AIMS Library Series. 


JAN MALCZAK has published over 20 research papers and taught courses in analysis, 
differential equations, measure and probability, and the theory of stochastic differential 
processes. He is currently Professor of Mathematics at AGH University of Science and 
Technology in Kraków, Poland. 


TOMASZ ZASTAWNIAK holds the Chair of Mathematical Finance at the University of 
York. He has authored about 50 research publications and four books. He has 
supervised four PhD dissertations and around 80 MSc dissertations in mathematical 
finance. 


Mastering Mathematical Finance 


Mastering Mathematical Finance is a series of short books that cover all 
core topics and the most common electives offered in Master’s programmes 
in mathematical or quantitative finance. The books are closely coordinated 
and largely self-contained, and can be used efficiently in combination but 
also individually. 


The MMF books start financially from scratch and mathematically as- 
sume only undergraduate calculus, linear algebra and elementary proba- 
bility theory. The necessary mathematics is developed rigorously, with em- 
phasis on a natural development of mathematical ideas and financial intu- 
ition, and the readers quickly see real-life financial applications, both for 
motivation and as the ultimate end for the theory. All books are written for 
both teaching and self-study, with worked examples, exercises and solu- 
tions. 

[DMFM] Discrete Models of Financial Markets, 

Marek Capiriski, Ekkehard Kopp 


[PF] Probability for Finance, 

Ekkehard Kopp, Jan Malczak, Tomasz Zastawniak 
[SCF] Stochastic Calculus for Finance, 

Marek Capinski, Ekkehard Kopp, Janusz Traple 
[BSM] The Black-Scholes Model, 

Marek Capinski, Ekkehard Kopp 
[PTRM] Portfolio Theory and Risk Management, 

Maciej J. Capiriski, Ekkehard Kopp 


[NMFC] Numerical Methods in Finance with C++, 
Maciej J. Capiriski, Tomasz Zastawniak 


[SIR] Stochastic Interest Rates, 

Daragh McInerney, Tomasz Zastawniak 
[CR] Credit Risk, 

Marek Capinski, Tomasz Zastawniak 
[FE] Financial Econometrics, 

Marek Capiriski 


[SCAF] Stochastic Control Applied to Finance, 
Szymon Peszat, Tomasz Zastawniak 


Series editors Marek Capinski, AGH University of Science and Technol- 
ogy, Kraków, Ekkehard Kopp, University of Hull; Tomasz Zastawniak, 
University of York 


Probability for Finance 


EKKEHARD KOPP 
University of Hull, Hull, UK 


JAN MALCZAK 
AGH University of Science and Technology, Kraków, Poland 


TOMASZ ZASTAWNIAK 
University of York, York, UK 


‘9 UNIVERSITY PRESS 


CAMBRIDGE 


UNIVERSITY PRESS 
University Printing House, Cambridge CB2 8BS, United Kingdom 


Published in the United States of America by Cambridge University Press, New York 
Cambridge University Press is part of the University of Cambridge. 

It furthers the University’s mission by disseminating knowledge in the pursuit of 
education, learning, and research at the highest international levels of excellence. 
www.cambridge.org 
Information on this title: www.cambridge.org/978 1 107002494 
© Ekkehard Kopp, Jan Malczak and Tomasz Zastawniak 2014 


This publication is in copyright. Subject to statutory exception 
and to the provisions of relevant collective licensing agreements, 
no reproduction of any part may take place without the written 
permission of Cambridge University Press. 


First published 2014 
Printed in the United Kingdom by Clays, St Ives plc 
A catalogue record for this publication is available from the British Library 
Library of Congress Cataloguing in Publication data 


ISBN 978-1-107-00249-4 Hardback 
ISBN 978-0-521-17557-9 Paperback 


Additional resources for this publication at www.cambridge.org/978 1 107002494 


Cambridge University Press has no responsibility for the persistence or accuracy of 
URLs for external or third-party internet websites referred to in this publication, 
and does not guarantee that any content on such websites is, or will remain, 
accurate or appropriate. 


Contents 


Preface 
1 Probability spaces 


1.1 Discrete examples 
1.2 Probability spaces 
1.3 Lebesgue measure 
1.4 Lebesgue integral 
1.5 Lebesgue outer measure 
2 Probability distributions and random variables 
2.1 Probability distributions 
2.2 Random variables 
2.3 Expectation and variance 
2.4 Moments and characteristic functions 
3 Product measure and independence 
3.1 Product measure 
3.2 Joint distribution 
3.3 Iterated integrals 
3.4 Random vectors in R” 
3.5 Independence 
3.6 Covariance 
3.7 Proofs by means of d-systems 
4 Conditional expectation 
4.1 Binomial stock prices 
4.2 Conditional expectation: discrete case 
4.3 Conditional expectation: general case 
4.4 The inner product space L?(P) 
4.5 Existence of E(X | G) for integrable X 
4.6 Proofs 
5 Sequences of random variables 
5.1 Sequences in L7(P) 
5.2 Modes of convergence for random variables 
5.3 Sequences of i.i.d. random variables 
5.4 Convergence in distribution 
5.5 Characteristic functions and inversion formula 


page vii 


106 
106 
112 
119 
130 
137 
142 


147 
147 
156 
167 
170 
174 


vi 


5.6 
5.7 


Index 


Contents 


Limit theorems for weak convergence 
Central Limit Theorem 


176 
180 


187 


Preface 


Mathematical models of financial markets rely in fundamental ways on 
the concepts and tools of modern probability theory. This book provides a 
concise but rigorous account of the probabilistic ideas and techniques most 
commonly used in such models. The treatment is self-contained, requiring 
only calculus and linear algebra as pre-requisites, and complete proofs are 
given — some longer constructions and proofs are deferred to the ends of 
chapters to ensure the smooth flow of key ideas. 

New concepts are motivated through examples drawn from finance. The 
selection and ordering of the material are strongly guided by the applica- 
tions we have in mind. Many of these applications appear more fully in 
later volumes of the ‘Mastering Mathematical Finance’ series, including 
[SCF], [BSM] and [NMFC]. This volume provides the essential mathe- 
matical background of the financial models described in detail there. 

In adding to the extensive literature on probability theory we have not 
sought to provide a comprehensive treatment of the mathematical theory 
and its manifold applications. We focus instead on the more limited objec- 
tive of writing a fully rigorous, yet concise and accessible, account of the 
basic concepts underlying widely used market models. The book should 
be read in conjunction with its partner volume [SCF], which describes the 
properties of stochastic processes used in these models. 

In the first two chapters we introduce probability spaces, distributions 
and random variables from scratch. We assume a basic level of mathemat- 
ical maturity in our description of the principal aspects of measures and 
integrals, including the construction of the Lebesgue integral and the im- 
portant convergence results for integrals. Beginning with discrete examples 
familiar to readers of [DMFM], we motivate each construction by means 
of specific distributions used in financial modelling. Chapter 3 introduces 
product measures and random vectors, and highlights the key concept of 
independence, while Chapter 4 is devoted to a thorough discussion of con- 
ditioning, moving from the familiar discrete setting via the properties of 
inner product spaces and the Radon—Nikodym theorem to the construction 
of general conditional expectations for integrable random variables. The fi- 
nal chapter explores key limit theorems for sequences of random variables, 
beginning with orthonormal sequences of square-integrable functions, fol- 


vii 


viii Preface 


lowed by a discussion of the relationships between various modes of con- 
vergence, and concluding with an introduction to weak convergence and 
the Central Limit Theorem for independent identically distributed random 
variables of finite mean and variance. 

Concrete examples and the large number of exercises form an integral 
part of this text. Solutions to the exercises and further material can be found 
at www.cambridge.org/978 1107002494. 


1 


Probability spaces 


1.1 Discrete examples 
1.2 Probability spaces 
1.3. Lebesgue measure 
1.4 Lebesgue integral 
1.5 Lebesgue outer measure 


In all spheres of life we make decisions based upon incomplete informa- 
tion. Frameworks for predicting the uncertain outcomes of future events 
have been around for centuries, notably in the age-old pastime of gambling. 
Much of modern finance draws on this experience. Probabilistic models 
have become an essential feature of financial market practice. 

We begin at the beginning: this chapter is an introduction to basic con- 
cepts in probability, motivated by simple models for the evolution of stock 
prices. Emphasis is placed on the collection of events whose probability 
we need to study, together with the probability function defined on these 
events. For this we use the machinery of measure theory, including the 
construction of Lebesgue measure on R. We introduce and study integra- 
tion with respect to a measure, with emphasis on powerful limit theorems. 
In particular, we specialise to the case of Lebesgue integral and compare it 
with the Riemann integral familiar to students of basic calculus. 


1.1 Discrete examples 


The crucial feature of financial markets is uncertainty related to the fu- 
ture prices of various quantities, like stock prices, interest rates, foreign 
exchange rates, market indices, or commodity prices. Our goal is to build 
a mathematical model capturing this aspect of reality. 


2 Probability spaces 


Example 1.1 

Consider how we could model stock prices. The current stock price (the 
spot price) is usually known, say 10. We may be interested in the price at 
some fixed future time. This future price involves some uncertainty. Sup- 
pose first that in this period of time the stock price jumps a number of times, 
going either up or down by 0.50 (such a price change is called a tick). Af- 
ter two such jumps there will be three possible prices: 9, 10,11. After 20 
jumps there will be a wider range of possible prices: 0, 1,2, ..., 19,20. 


The set of all possible outcomes will be denoted by Q and called the 
sample space. The elements of Q will be denoted by w. For now we as- 
sume that Q is a finite set. 


Example 1.2 

If we are interested in the prices after two jumps, we could take Q = 
{9, 10, 11}. If we want to describe the prices after 20 jumps, we would take 
OS (0,1, Zaoa o 1, A 


The next step in building a model is to answer to the following question: 
for a subset A C Q, called an event, what is the probability that the outcome 
lies in A? The number representing the answer will be denoted by P(A), 
and the convention is to require P(A) € [0, 1] with P(Q) = 1 and P(@) = 0. 
We shall write p, = P({w}) for any w € Q. Given p, for all w € Q, the 
function P is then constructed for any A C Q by adding the values attached 


to the elements of A, 
P(A) = >) Pu. 


weA 


This immediately implies an important property of P, called additivity, 
P(A U B) = P(A) + P(B) for any disjoint events A, B. 


By induction, it readily extends to 


P Ù a) = > P(A;) for any pairwise disjoint events A,,...,Am. 
i=l 


i=1 


1.1 Discrete examples 3 


Example 1.3 
Consider Q = {9, 10, 11}. The simplest choice is to assign equal probabili- 
HES jm = Py = in = 1 to all single-element subsets of Q. 


Example 1.4 
In the case of Q = {0,1,2,...,19,20} we could, once again, try equal 
probabilities for all single-element subsets of Q, namely, po = pi ==- = 


= 
Pa. 


The uniform probability on a finite Q assigns equal probabilities pọ = 
x for each w € Q, where N is the number of elements in Q. 


Example 1.5 

Uniform probability does not appear to be consistent with the scheme 
in Example 1.1, where the stock prices result from consecutive jumps 
by +0.50 from an initial price 10. In the case of two consecutive jumps 
one might argue that the middle price 10 should carry more weight since 
it can be arrived at in two ways (up-down or down-up), while either of 
the other two values can occur in just one way (down—down for 9, up—up 
for 11). Hence, price 10 would be twice as likely as 9 or 11. 


To reflect these considerations on Q = {9,10,11} we can take po = L, 
PO 4, pu = ie 
Example 1.6 
Similarly, for Q = {0,1,2,...,19,20} we can take p, = (°°), where 
(i) =F ae is the number of scenarios consisting of n upwards and 20—n 


downwards price jumps of 0.50 from the initial price 10, with each scenario 
equally likely. This is illustrated in Figure 1.1. 


In general, when for an N-element Q we have p, = E we call this 


the symmetric binomial probability. Clearly, So Pn = 1. 


4 Probability spaces 


0 2 4 6 8 10 12 14 16 18 20 22 24 26 


Figure 1.1 Binomial probability and additive jumps. 


The mechanism of price jumps by constant additive ticks is not entirely 
satisfactory as a model for stock prices. After sufficiently many jumps, 
the range of possible prices will include negative values. To have a more 
realistic model we need to adjust this mechanism of price jumps. 


Example 1.7 

The first price jump of +0.50 means that the price changes by +5%. In sub- 
sequent steps we shall now allow the prices to go up or down by 5% rather 
than by a constant tick of 0.50. The possible prices will then be Q = {w, : 
n=0,1,2,..., 19, 20} after 20 jumps, with w, = 10x 1.05" x 0.959". The 
prices will remain positive for any number of jumps. We choose the prob- 
abilities in a similar manner as before, pao, = a zv. Compare Figure 1.2 
with Figure 1.1 to observe a subtle but crucial shift in the distribution of 
stock prices. 


The above examples restrict the possible stock prices to a finite set. In an 
attempt to extend the model we might want to allow an infinite sequence 
of possible prices, that is, a countable set Q. 


Example 1.8 
Suppose that the number of stock price jumps occurring within a fixed time 
period is not prescribed, but can be an arbitrary integer N. To be specific, 
suppose that the probability of N jumps is 
AN et 

N! 


an = 


1.1 Discrete examples 5 


0 2 4 6 8 10 12 14 16 18 20 22 24 26 


Figure 1.2 Binomial probability and multiplicative jumps. 


with N = 0,1,2,... for some parameter 2 > 0. The probability of large N is 
small, but there is no upper bound on N, allowing for some hectic trading. 
Clearly, 


This is called the Poisson probability with parameter 4. 

Furthermore, conditioned on there being N jumps, the possible final 
stock prices will be described by means of the binomial probability and 
multiplicative jumps. We assume, like in Example 1.7, that each jump in- 
cerases/reduces the stock price by 5% with probability L. The stock price 
at time T will become 


S(T) = 10 x 1.05” x 0.955" 


(al 
PN,n = IN aT 


n 


with probability 


that is, the probability gy of N = 0, 1,2, ... jumps multiplied by the prob- 
ability C) of n upwards price movements among those N jumps, where 
0 < n < N. We take Q to be the set of such pairs of integers N, n. The 
formula P(A) = X} oca Pw defining the probability of an event now includes 


infinite sets A C Q. 


This example shows that it is natural to consider a stronger version of 


6 Probability spaces 


the additivity property: 


P Ù a) = 5 P(A;) 
i=l i=1 


for any sequence of pairwise disjoint events A1, A2, ... C Q. This is known 
as countable additivity. 


Example 1.9 

Another example where a countable set emerges in a natural way is re- 
lated to modelling the instant when something unpredictable may happen. 
Time is measured by the number of discrete steps (of some fixed but un- 
specified length). At each step there is an upward/downward price jump 
with probabilities p,1 — p € (0,1), respectively. The probability that an 
upward jump occurs for the first time at the nth step can be expressed as 
Pn = (1 — p)""'p. It is easy to check that °°, pan = 1, which gives a prob- 
ability on Q = {1,2,...}. This defines the geometric probability. 


1.2 Probability spaces 


Countable additivity turns out to be the perfect condition for probability 
theory. The actual construction of a probability measure can present diffi- 
culties. In particular, it is sometimes impossible to define P for all subsets 
of Q. The domain of P has to be specified, and it is natural to impose some 
restrictions on that domain ensuring that countable additivity can be for- 
mulated. 


Definition 1.10 
A probability space is a triple (Q, F , P) as follows. 
(i) Qis anon-empty set (called the sample space, or set of scenarios). 
(ii) F is a family of subsets of Q (called events) satisfying the following 
conditions: 
e QEF; 
e if A; € F fori = 1,2,..., then U2, A; € F (we say that F is 
closed under countable unions); 
e if A € F, thenQ\A E F (we say that F is closed under comple- 
ments). 
Such a family of sets F is called a o-field on Q. 


1.2 Probability spaces 7 


(iii) P assigns numbers to events, 
PSF = [0,1], 


and we assume that 

e P(Q)=1; 

e for all sequences of events A; € F, i = 1,2,3,... that are pairwise 
disjoint (A; N A; = Ø fori + j) we have 


lV a = y P(A)). 
i=l i=1 


This property is called countable additivity. 
A function P satisfying these conditions is called a probability mea- 
sure (or simply a probability). 


Exercise 1.1 Let F be a o-field and A1, A2,... E€ F. Show that 
Ni Ai € F for each n = 1,2,... and that NZ A; EF. 


Exercise 1.2 Suppose that F is a o-field containing all open inter- 
vals in [0,1] with rational endpoints. Show that F contains all open 
intervals in [0, 1]. 


Before proceeding further we note some basic properties of probability 
measures. 


Theorem 1.11 
If P is a probability measure, then: 
(i) PiU, Ad) = Xi P(A) for any pairwise disjoint events A; € F, 
i= 1,2,...,n (finite additivity); 
(ii) P(Q \ A) = 1 — P(A) for any A € F ; in particular, P(@) = 0; 
(iii) A C B implies P(A) < P(B) for any A, B € F (monotonicity); 
(iv) PUK Ad) < ML, P(A) for any A; € F, i = 1,2,...,n (finite sub- 
additivity); 
(v) if Anı D Ay EF for all n > 1, then P(e, An) = Vimy sco P(Am); 
(vi) PUŽ A) < Xo P(A) for any A; € F, i = 1,2,... (countable 
subadditivity ); 
(vii) if Anı C An € F for alln > 1, then P(N}; An) = lito P(Am). 


8 Probability spaces 


Proof (i) Let Ans, = Anya = +++ = Ø and apply countable additivity. 

(ii) Use (i) with n = 2, A; = A, A. = Q\A. 

(ii) Since B = A U (B \ A) and we have disjoint components, we can 
apply (i), so 


P(B) = P(A) + P(B \ A) > P(A). 
(iv) For n = 2, 
P(A; U A2) = P(A; U (A2 \ A1)) = P(A1) + P(A2 \ Ai) < P(A1) + P(A2) 


and then use induction to complete the proof for arbitrary n, where the 
induction step will be the same as the above argument. 
(v) Using the above properties, we have (with Ag = Ø) 


p Ù 4, =P [Üa \ a») =S PAA) 
n=1 


n=0 n=0 


lim J Paas lim (Ore 29 
n=0 


n=0 


lim P(Am+ı ). 


m—- oo 


(vi) We put B, = Ui, Aj, so that Bry D B, € F for all n > 1, and 
using (v) we pass to the limit in the finite subadditivity relation (iv): 


[Lae] = PL 8] = im Pan = tm (| 
n=l n=l n=1 


< lim $ P(A)= P(A). 


m=% 
n=1 n=1 


(vii) Take A = N% An, note that P(A) = 1 — P(Q \ A) by (ii), and 
apply (v): 


P(A) =1- (Lhe \ Av) 


n=1 


= 1- lim P(Q \ An) = lim P(An). 


m—> o 


Oo 


The construction of interesting probability measures requires some labour, 
as we shall see. However, a few simple examples can be given immediately. 


1.2 Probability spaces 9 


Example 1.12 

Take any non-empty set Q, fix w € Q, and define 6,,(A) = 1 if w € A and 
6(A) = 0 if w ¢ A, for any A C Q. It is a probability measure, called the 
unit mass, also known as the Dirac measure, concentrated at w. If F is 
taken to be the family of all subsets of Q, then (Q, F, ô») is a probability 
space. 


Example 1.13 
Let N be a positive integer. On Q = {0,1,..., N} define 


NN 
P(A)= Y, WEZE 


n=0 
for any A C Q, where ô, is the unit mass concentrated at n from Exam- 
ple 1.12. We take F to be the family of all subsets of Q. Then (Q, F, P) 
is a probability space. This is clearly the symmetric binomial probability 
considered earlier. 
More generally for the same Q and any p € (0, 1), the binomial proba- 
bility with parameters N, p is defined by setting 


LN 
P(A) = >| Jra Toa 
n=0 n 
It is immediate from the binomial theorem that P(Q) = 1. 

This example is often described as providing the probabilities of events 
relating to the repeated tossing of a coin (where successive tosses are as- 
sumed not to affect each other, in a sense that will be made precise later): if 
for any given toss the probability of ‘Heads’ is p, the probability of finding 
exactly k ‘Heads’ in N tosses is (ra — pyN-*, 


Example 1.14 
Fix 2 > 0, let Q = {0,1,2,...} and let F be the family of all subsets of Q. 
For any A € F put 


o0 


Au 
P(A) = be 0A), 


n=0 


10 Probability spaces 


where ô, is the unit mass concentrated at n. Then (Q, F , P) is a probability 
space. This gives the Poisson probability mentioned in Example 1.8. 


In addition to subsets of R, for example the set [0, co) of all non-negative 
real numbers, it often proves convenient to consider sets containing oo or 
—oo in addition to real numbers. For instance, we write [—co, oo] for the set 
of all real numbers in R together with co and —os, and [0, co] to denote the 
set of all non-negative real numbers together with oo. 

Probability measures belong to a wider class of countably additive set 
functions taking values in [0, co]. Let Q be a non-empty set and let F be a 
o-field of subsets of Q. 


Definition 1.15 
We say that u : F — [0, co] is a measure and call (Q,F, u) a measure 
space if 

(i) u(S) = 0; 

(ii) for any pairwise disjoint sets A; E€ F, i = 1,2,... 


H Ù a E y H(A). 
i=l i=l 


Note that some of the terms u(A;) in the sum may be infinite, and we use 
the convention x + co = oo for any x € [0, oo]. 
Moreover, we call u a finite measure if, in addition, 4(Q) < œ. 


The properties listed in Theorem 1.11 and their proofs can readily be 
adapted to the case of an arbitrary measure. 


Corollary 1.16 

Properties (i) and (iii)—(vi) listed in Theorem 1.11 remain true for any 
measure u. If we assume in addition that u(Q) < œ, then (ii) becomes 
(Q \ A) = u(Q) — WA). Moreover, if u(A;) is finite, then (vii) still holds. 


Example 1.17 
For any non-empty set Q and any A C Q let 


uA) = >) (A), 


weA 


where ô» is the unit mass concentrated at w. The sum is equal to the number 


1.3 Lebesgue measure 11 


of elements in A if A is a finite set, and œ otherwise. Then p is a measure, 
called the counting measure, on © defined on the family F consisting of 
all subsets of Q. It is not a probability measure, however, unless Q is a 
one-element set. 


1.3 Lebesgue measure 


The discrete stock price models in Section 1.1 admit only a limited range of 
prices. To remove this restriction it is natural to allow future prices to take 
any value in some interval in R. Probability spaces capable of capturing 
this modelling choice require the notion of Lebesgue measure, introduced 
in this section. In particular, it will facilitate a study of log-normally dis- 
tributed stock prices and it will prove instrumental in the development of 
stochastic calculus, which is of fundamental importance in mathematical 
finance. 

To begin with, take any open interval J = (a,b) C R, where a < b. We 
denote the length of the interval by 


I) =b-a. 


The family of such intervals will be denoted by 7. Observe that Ø = (a, a) € 
J, and that 1(@) = 0. 

However, J is not a o-field, and the length / as a function defined on £ 
is not a measure. Can this function be extended to a larger domain, a o- 
field containing 7, on which it will become a measure? The answer to this 
question is positive, as we shall see in Theorem 1.19, but not immediately 
obvious. 

First of all, we need to identify the o-field to provide the domain of such 
a measure. To make the task of extending the length function easier, we 
want the o-field to be as small as possible, as long as it contains J. The 
intersection of all o-fields containing Z, denoted by 


B(R)= F : F is ao-field on R and I c F} (1.1) 


and called the family of Borel sets in R, is the smallest o-field containing 7 
as shown in the next exercise. 


12 Probability spaces 


Exercise 1.3 Show that: 
(1) BCR) is a o-field on R such that 7 c B(R); 
(2) if F is ao-field on R such that J c F, then B(R) c F. 


We could equally have begun with closed intervals [a, b] since this class 
of intervals leads to the same o-field B(R). To see this we only need to note 
that [a,b] = NZ (a- 1b + 1) and (a, b) = U [a + 1b — 17, 

In particular, singleton sets {a} = [a, a] belong to B(R) for all a € R, and 
so do all finite or countable subsets of R. Hence the set Q of rationals and 
its complement R \ Q, the set of irrationals, also belong to B(R). 


Definition 1.18 
For each A € B(R) we put 


o0 


m(A) = inf [5 KJI): Ac © n), (1.2) 
k=1 k=1 


where the infimum is taken over all sequences (J;)~., consisting of open 
intervals. We call m the Lebesgue measure defined on A(R). 


When A C UÈ; Ją we say that A is covered by the sequence of sets 
(Ja). The idea is to cover A by a sequence of open intervals, consider the 
total length of these intervals as an overestimate of the measure of A, and 
take the infimum of the total length over all such coverings. 


Theorem 1.19 
m : B(R) > [0, œ] is a measure such that m((a, b)) = b — a for alla < b. 


The proof can be found in Section 1.5. The details are not needed in 
the rest of this volume, but any serious student of mathematics applied in 
modern finance should have seen them at least once! 


Remark 1.20 
Lebesgue measure can be defined on a o-field larger than B(R), but cannot 
be extended to a measure defined on all subsets of R.! 


Exercise 1.4 Find m (|4, 2)) and m ({-2, 3] U [3, 8). 


' See, for example, M. Capiriski and E. Kopp, Measure, Integral and Probability, 2nd 
edition, Springer-Verlag 2004. 


1.4 Lebesgue integral 13 


Exercise 1.5 Compute m (Ux. (4. 1). 


Exercise 1.6 Find mN), m(Q), m(R\Q), m({x € R : sin x = cos x}). 


Exercise 1.7 Show that the Cantor set C, constructed below, is un- 
countable, but that m(C) = 0. 

The Cantor set is defined to be C = Mẹ o Cnh, where Co = [0, 1], 
Cı is obtained by removing from Co the ‘middle third’ i 2) of the 
interval [0, 1], C2 is formed by similarly removing from C; the ‘middle 
thirds’ (5, 3), (2, $) of the two intervals [0, £], [}, 1], and so on. The set 


C, consists of 2” closed intervals, each of length Gy. 


Exercise 1.8 Show that for any A € (R) and for any x € R 
m(A) = m(A + x), 


where A+ x = {a+ x € R : a € A}. This property of Lebesgue measure 
is called translation invariance. 


Lebesgue measure allows us to define a probability on any bounded in- 
terval Q = [a, b] by writing, for any Borel set A C [a,b], 
m(A) 
mQ) 
This is called the uniform probability on [a, b]. 


P(A) = 


1.4 Lebesgue integral 


As we noticed in the discrete case, uniform probability does not lend itself 
well to modelling stock prices. Similarly, in the continuous case, we need 
more than just the uniform probability on an interval. A natural idea is to 
replace the sum P(A) = )).,<4 Pw, used in the discrete case to express the 
probability of an event A, by an integral understood in an appropriate sense. 
The simplest case is that of an integral of a continuous function on R when 


14 Probability spaces 


Figure 1.3 Approximating the area under the graph of f by rectangles. 


A = [a,b] is an interval. With this in mind, we briefly review some basic 
facts concerning integrals of continuous functions. 


Riemann integral 


Let f : [a,b] — R be a continuous function. In this case f must be 
bounded, so the area under the graph of f is finite. To approximate this area 
we divide it into strips by choosing a sequence of numbers a = co < cı < 
+++ < Cn = b and approximate each strip by a rectangle. We take the height 
of such a rectangle with base [c;_,c;] to be f(x;) for some x; € [c;_1, cil, 
see Figure 1.3. The total area of the rectangles is 


Sn = 3 FAN- ci). 


Let 6, = MaXi=1,...n |Ci — ci-1|. The sequence S, for n = 1,2,... converges 
to a limit independent of the way the c; and x; are selected, as long as 
lim,00 ôn = 0. We call this limit the Riemann integral of f over [a,b] 
and denote it by 


b 
{ f(x) dx = lim Se 


The integral f. ” f(x)dx exists and is finite for any continuous function f. 
The same applies to bounded functions having at most a countable number 
points of discontinuity. 

There are, however, some fairly obvious functions for which the Rie- 
mann integral cannot be defined. 


1.4 Lebesgue integral 15 


Example 1.21 
Consider the function f : R — [0, œ) defined as 


fo ifxeQ 
wl ifxeR\O. 


Fix a sequence 0 = cg < cy <--- < c, = | of points in the interval [0, 1]. 
Each subinterval [c;_;, c;] contains both rational and irrational numbers, so 
taking the x; to be rationals we get 


X HOME =e) =O, 
Heil 

while for irrational x; we get 
yy PNG =G) = Ik. 
i=l 


As n approaches infinity we can get a different limit (or in fact no limit 
at all), depending on the choice of the x;, which means that the Riemann 


integral iG f(x)dx does not exist. 


The following result captures the relationship between derivatives and 
Riemann integrals. 


Theorem 1.22 
Let f : [a,b] — R be a continuous function. Then we have the following. 
(i) The function defined for any x € [a, b] by 


F(x) = { “fO) dy 


is differentiable and its derivative at any x € [a,b] (at a and b we 
take right- or left-sided derivatives, respectively) is 


F'(x) = f(x). (1.3) 
Gi) For any function F : [a,b] > R satisfying (1.3) 
b 
f f(x)dx = F(b) - F(a). 


A function F satisfying (1.3) is called an antiderivative of f. Such a 
function is unique up to a constant. 


16 Probability spaces 


0 2 4 6 8 10 12 14 16 18 20 


Figure 1.4 Normal density. 


Some technicalities are involved to justify these claims about the Rie- 
mann integral, of course, but they are not needed in what follows. These 
elementary techniques of integration form part of any calculus course. 

The Riemann integral makes it possible to describe the probability of 
any event represented by an interval A = [a, b] as the integral 


b 
P(A) = | f(x) dx. (1.4) 


of a continuous (or piecewise continuous) function f : R — [0, œœ), as long 
as 


f rwa- 1 (1.5) 


Here we have used the indefinite Riemann integral, defined as f > f(x) dx = 
limo f f(x) dx. 


Example 1.23 
Take 


“a (1.6) 


where u € R and o > 0 are parameters. This is called a normal (or Gaus- 
sian) density. 

In Figure 1.4 we sketch the graph for u = 10 and ø = 2.236. As we shall 
see in Example 5.54, this choice of parameters is related to Example 1.6, 
where the stock prices change in an additive manner over 20 time steps, 
which was illustrated in Figure 1.1. 


1.4 Lebesgue integral 17 


0 2 4 6 8 10 12 14 16 18 20 


Figure 1.5 Log-normal density. 


From the standpoint of modelling stock prices a disadvantage of the nor- 
mal density is that the probability of negative values is non-zero, which is 
hardly acceptable when modelling prices even if, as the graph suggests, 
this probability may be small. 


Exercise 1.9 Verify that f given by (1.6) satisfies (1.5). 


Example 1.24 
An example related to the multiplicative changes of stock prices discussed 
in Example 1.7 is based on the choice of 


2 for x > 0, (1.7) 
for x < 0, 


fla) =4 wove 
Ds) = xo V20 
0 


called a log-normal density. 

The graph for u = 2.2776 and øo = 0.2238 is depicted in Figure 1.5. 
Compare this with Figure 1.2. These values are related to those in Exam- 
ple 1.7, as will be explained in Example 5.55. 

Negative prices are excluded. The log-normal density is widely accepted 
by the financial community as providing a standard model of stock prices. 


18 Probability spaces 


Exercise 1.10 Verify that f given by (1.7) satisfies (1.5). 


Integral with respect to a measure 


Difficulties arise when P given by the Riemann integral (1.4) needs to be 
extended from intervals to a measure on a o-field. To minimize effort, we 
take the smallest o-field containing intervals, that is, the o-field of Borel 
sets B(R), just like we did when introducing Lebesgue measure m. 

We outline the constructions involved in this extension, leaving routine 
verifications as exercises. While we aim primarily to construct an integral 
on the measure space (R, B(R), m), this is just as easy on an arbitrary mea- 
sure space (Q,F,), which is what we will now do, thereby achieving 
greater generality. 

First we integrate so-called simple functions. The indicator function of 
any A C Q will be denoted by 


1 ifxeA, 
uw] 0 ifxeQ\A. 


Definition 1.25 
By definition a (non-negative) simple function has the form 


n 
S= > sila, 
i=1 


where Aj,...,A, € F are pairwise disjoint sets with J}; A; = Q, and 
where s; > 0 fori = 1,...,n. We define 


i sdu = ` siu(A;). 
Q i=l 


It may happen that u(A;) = co for some i, and then the conventions 0-œ = 0 
and x- co = oo for x > 0 are used. 


Exercise 1.11 Show that if r, s are any non-negative simple functions 
and a,b > O are any non-negative numbers, then ar + bs is a simple 
function and 


[tarsbsdu=a | rau f sau 
Q Q Q 


1.4 Lebesgue integral 19 


Exercise 1.12 Show that if r,s are non-negative simple functions 


such that r < s, then 
[rau < f sdu. 
Q Q 


Proposition 1.26 
Suppose that s is a non-negative simple function. Then 51, is also a non- 
negative simple function for any B € F, and 


"B= | stedu 
Q 


is a measure on Q. defined on the o-field F. 


Proof Take any B € F, and let s = Xi; s;l4,, where A),...,A, E F are 
pairwise disjoint sets with UL, A; = Q, and s; > 0 fori = 1,...,n. Then 
slg = X2] ric, where r; = sp Ci = A;N B fori = 1,...,n and where 
Tnot = 9, Cay, = Q \ B, is also a simple function. Moreover, 


n+l n 


v(B) = f slp du = > rulc) = > si(A; N B). 
Q 


i=1 i=1 


In particular, for B = Ø 


y) = SipL(D) = 0. 
i=l 
Now suppose that B = U$; Bj, where B; € F for j = 1,2,... are pairwise 
disjoint sets. Then countable additivity of u gives 


n n 


B= Y An B= Y 5) wasn B) 


i=1 i=1 j=l 


o0 


= 5 3 si(A; N Bj) = X v(B)). 


j=l i=l j=l 
We have proved that v is a measure. Oo 


Next, we need to identify the class of functions that will be integrated 
with respect to a measure. First we introduce some notation. For any B c R 
we denote by {f € B} or by f~! (B) the inverse image of B under f, that is, 


(f € B} = f (B) = {w € Q : fw) € B}. 


20 Probability spaces 


The notation extends to the case when the set of values of f is specified in 
terms of a certain property, for example, {f > a} = {w € Q: f(w) > a} is 
the inverse image of (a, oo) under f. 


Definition 1.27 
We say that a function f : Q — [-co, co] is measurable (more precisely, 
measurable with respect to F or F -measurable) if 


{fEeByeF for all B € B(R) 


and 


[f = œ}, {f = -} EF. 


Exercise 1.13 Show that f : Q — R is measurable if and only if 
{f >a}e F foralla ER. 


Exercise 1.14 Show that every simple function is measurable. 


Exercise 1.15 Show that the composition go f of a measurable func- 
tion f : Q — R with a continuous function g : R > R is measurable. 


Exercise 1.16 If fp are measurable functions for k = 1,...,n, show 
that max{f|,..., fa} and min{f,,..., fa} are also measurable. 
Exercise 1.17 If f, are measurable functions for n = 1,2,..., show 


that sup,,, Jn and inf, fn are also measurable. 


Exercise 1.18 Suppose that f, are measurable functions for n = 
1,2,... . Recall that, by definition, 
lim sup fn = inf (sup ia} liminf f, = sup (inf fy). 


noo n<k k>1 \ sk 


Show that lim sup,_,., fr and liminf,,.. fn are also measurable func- 
tions. 


1.4 Lebesgue integral 21 


Exercise 1.19 For any sequence of measurable functions f, for n = 
1,2,..., show that lim,_,.. fn is also a measurable function. 


Proposition 1.28 
Let f : Q — [0, œ]. The following conditions are equivalent: 
(i) f is a measurable function; 
(ii) there is a non-decreasing sequence of non-negative simple functions 
Sy N= 1,2,... such that f = Vimy 40 Sp. 


Proof (i) > (ii) Let f : Q — [0, co] be a measurable function. For each 
n=1,2,... we put 
n2" 


w= Bla (1.8) 
i=0 


where A;, = {i2"" < f < (i+ 1)2”}. This defines a non-decreasing se- 
quence of simple functions such that f = lim... Sn (see Exercise 1.20). 
(ii) > (i) This is a consequence of Exercises 1.14 and 1.19. o 


Exercise 1.20 Verify that (1.8) defines a non-decreasing sequence of 
non-negative simple functions such that f = limy—co Sp. 


Exercise 1.21 Show that if f, g are non-negative measurable func- 
tions and a, b > 0, then af + bg is measurable. 


Exercise 1.22 Show that if f, g are non-negative measurable func- 
tions, then fg is measurable. 


The next step is to define the integral of any non-negative measurable 
function. 


Definition 1.29 
For any non-negative measurable function f : Q — [0, co] the integral 
of f is defined as 


[tau = sup 1 sdu : sis a simple function such that s < s}. 
Q Q 


22 Probability spaces 


If the supremum is finite, we say that f is integrable. 


Exercise 1.23 For any non-negative measurable functions /, g such 


that f < g show 
[tars [ eau 
Q Q 


Remark 1.30 

For any non-negative simple function s, Exercise 1.23 implies that Defini- 
tions 1.25 and 1.29 of the integral give the same result, so the same notation 
Ja s du can be used in both cases. 


The monotone convergence theorem, stated here for non-negative mea- 
surable functions, is a standard tool for handling limit operations, some- 
thing that the integral with respect to a measure can tackle with remarkable 
ease. 


Theorem 1.31 (monotone convergence) 

If f : Q —> [0, œ] and fa : Q —> [0, œ] for n = 1,2,... is a non-decreasing 
sequence of non-negative measurable functions such that f = liM% fw 
then f is a non-negative measurable function and 


[feu = tim fr dps. 
Q n-oo Q 


Proof That f is measurable follows from Exercise 1.19. We put L = 
Him, =>% i, Jn du for brevity. The limit exists (but may be equal to oo) and 


satisfies L < f f du because f Jn qt is an non-decreasing sequence and 


Gah du < f fdu by Exercise 1.23. 

To show that, on the other hand, L > Í f du we take any non-negative 
simple function s such that s < f. We also take any œ e (0,1) and put 
B, = {fa = as}. Because f, > filg, > œslg,, using Exercise 1.23 once 


again, together with Exercise 1.11, we obtain 


| tedu> [ sty, du=or | sts, du= 0v8.) 
Q Q Q 


for each n, where v is the measure on F defined in Proposition 1.26. Since 
fa is anon-decreasing sequence, it follows that B, C B,+1 for each n. More- 
over, since lim fa = f > s > as, we have |J? Ba = Q. Because v is a 


1.4 Lebesgue integral 23 


measure, we therefore have 
L > a lim v(B,) = av(Q) = of sdu 
n—-oo Q 


from Theorem 1.11 (v) adapted to the case of a measure in Corollary 1.16. 
This is so for any a € (0,1) and any simple function s such that s < f, 
which implies that L > Í f du, completing the proof. o 


Exercise 1.24 Let f, for n = 1,2,... be a sequence of non-negative 
measurable functions. Show that | fn is a non-negative measurable 


function and 
fJau= >> | fdu 
MOCNI 


n=1 


Proposition 1.32 
Let f,g : Q — [0, œ] be non-negative measurable functions and take any 
a,b > 0. Then af + bg is measurable and 


faf+bodu=af fdurb | gau 
Qa Q Q 


where we use the conventions x0 = 0, x: œ = œ for any x > 0, and 
x+ œ = œ for any x = 0. 


Proof According to Proposition 1.28, there are non-decreasing sequences 
r, and s, of simple functions such that f = lim, 7, and g = limy 40. Sn- 
It follows that ar, + bs, is a non-decreasing sequence of simple functions 
and af + bg = lim,_,.(ar, + bs,). By the monotone convergence theorem 
(Theorem 1.31) and Exercise 1.11, it follows that 


[ar + bg) du = lim [an + bs,) du 
Q n—-oo Q 


=a | fdu+b | gdu 
Q Q 


The final step in the construction of the integral with respect to a mea- 
sure is to extend the definition from non-negative measurable functions to 
arbitrary ones by integrating their positive and negative parts separately. 


=alim rudu+biim [syd 
Q n= Q 


oO 


24 Probability spaces 


Definition 1.33 
Let f : Q > [-co, co] be a measurable function. If both the positive and 
negative parts 


f* =max{f,0}, f° = max{—f,0} (1.9) 


are integrable, we say that f itself is integrable. When at least one of the 
functions f*, f- is integrable, we define the integral of f as 


Stiu | fau- | Fan 


where the conventions x + co = œ and x — co = —oo for any x € R apply 
whenever one of the integrals on the right-hand side is equal to œo. When 
neither f* nor f~ are integrable, then the integral of f will remain unde- 
fined. 


Exercise 1.25 Let f be a measurable function. Show that f is inte- 
grable if and only if |f] is. 


Exercise 1.26 For any integrable function f show that 


|f sad < [in aw. 


Exercise 1.27 Let f, g be integrable functions and let a,b € R. Show 
that af + bg is integrable and 


faf+bodu=a | fdu+b | sau 
Q Q Q 


Exercise 1.28 For any integrable functions f, g such that f < g show 


ftus feau 


1.4 Lebesgue integral 25 


Exercise 1.29 Extend the monotone convergence theorem (Theo- 
rem 1.31) to the case when f, for n = 1,2,... is a non-decreasing 
sequence of integrable functions and f = lim,—. fn is also an inte- 
grable function. 


It proves convenient to consider the integral over any B € F rather than 
just over Q. 


Definition 1.34 
For any B € F and any measurable f : Q — [-co, co] we define the 


integral of f over B by 
f fdu = f fls du 
B Q 


whenever the integral Í fls du exists (including the cases when it is co or 
—oo), and we say that f is integrable over B whenever this integral is finite. 


The following result is an extension of Proposition 1.26. 


Theorem 1.35 
Suppose that f : Q — [0, œ] is measurable, and 


(B) = f T 
B 


for any B € F. Then v is a measure on Q defined on the o-field F. 
Proof Suppose that B = Uj, B;, where B; € F for i = 1,2,... are pair- 


wise disjoint sets. Then 


En = fly, B; = ` fls, 


i=l 
is a non-decreasing sequence of measurable functions, and lim, gn = 
fsx. It follows by the monotone convergence theorem (Theorem 1.31) that 


"B= | fdu= | fisdu= | (time) 
= lim f gdu = > [fea Dr). 


Moreover, for B = @ we have 1, = 0, so 


o= | fledu =0. 
Q 


26 Probability spaces 
This completes the proof. Oo 


If a measurable function f :  — R has a property everywhere except 
on some set B € F such that u(B) = 0, we say that it has this prop- 
erty p-almost everywhere (y-a.e. for short) or, particularly when p is a 
probability measure, u-almost surely (u-a.s. for short). For instance, if 
Lf + 0} = 0, we say that f = 0, p-a.e. 


Proposition 1.36 
Let f : Q — [0, œ] be a non-negative measurable function. Then f = 0, 


L-a.e. if and only if f fdu=0. 


Proof Suppose that f = 0, p-a.e., that is, u(B) = O for B = {f > 0}. For 
every simple function s such that s < f we have s = 0 on Q \ B. Writing 
the simple function as s = >", sila, where Aj,...,A, E F and s; > 0 
fori = 1,...,n, we then have u(A;) = 0 if A; C B, and s; = O otherwise. 
This means that Í sdu = $; SiH(A;) = 0 because u(A;) = 0 or s; = 0 for 
each i. Because h s du = 0 for every simple function s such that s < f, it 
follows that f, f du = 0. 

Conversely, if f fdu = 0, then B = {f > 0} = UZ; B, where B, = {f = 
1, This is an increasing sequence of sets in F, so u(B) = lim,. (Bn). 
The simple function s, = 11 g, satisfies s, < f, so 


1 
LB) = f sidus | fay =o. 
n Q Q 


which means that u(B„) = O for all n, hence u(B) = 0. Therefore f = 0, 
p-a.e. o 


Exercise 1.30 Let f : Q — [-©o, co] be a measurable function. Show 
that ffau = 0 for every B € F if and only if f = 0, p-a.e. 


The origin of the next proposition is the familiar change of variables 
formula for transforming Riemann integrals. In the case of integrals with 
respect to a measure we have the following change of measure result. 


Proposition 1.37 

Suppose that (Q, F , u) and (č, F, ñ) are measure spaces, and p : Q > 
Õ is a function such that g '(A) € F and u(g™!(A)) = Ō(A) for every 
A € F. If g = žo ọ is the composition of y and a measurable function 
&: Õ > [=%, co], then g: Q > [-œ, œ] is a measurable function, the 


1.4 Lebesgue integral 2] 


integral Í g du exists if and only Ío & dt exists (including the cases when 
the integrals are equal to œ or —œ0), and 


[ sae= fga 
Q Q 


Proof Suppose that § = X}, s;14, is a non-negative simple function on Q, 
where s; € [0, œ) and where A; € F for i = 1,...,n are disjoint sets such 
that UL, A; = Q. Then g! (A) € F fori = 1,...,n are disjoint sets such 
that U2, ¢' (A) = Q, and s = So = Xi Silea is a non-negative 
simple function on Q. It follows that 


n 


Í sdu = } sie (Ad) = J sA) = Í s djt 
i=l 


i=l Q 


Now suppose that 2 is non-negative measurable function on ©. By Propo- 
sition 1.28 there is a non-decreasing sequence of non-negative simple func- 
tions 5, k = 1,2,... on Q such that & = limo sz. Then s = Sk o Q 
is a non-decreasing sequence of non-negative simple functions on Q and 
g = liMk>o% Sx is a measurable function on Q. It follows by the monotone 
convergence theorem (Theorem 1.31) that 


[equ = tim | sidu=tim | da= f gap 
Q k> Jo k>» Ja a 


In general, if 3 is a measurable function on Q, then g = % 0 y is a mea- 
surable function on Q. We can write = g* — & and, correspondingly, 
g = g* — g`, where &*, g~ are non-negative measurable functions on Q and 
g = tog, g = & ow are non-negative measurable functions on Q. From 
the above argument we know that 


[etdu= | z'an [edu= | xan. 
Q Q Q Q 


It follows that i g du exits if and only if Ío & dñ exists, and 


28 Probability spaces 


Lebesgue integral 


Our aim when constructing an integral with respect to a measure was to ex- 
tend the Riemann integral. To achieve this, we now specialise to the mea- 
sure space (R, B(R), m) with Lebesgue measure m defined on the Borel sets 
BR). 


Definition 1.38 
Let f : R > [-©9, oo]. 
(i) We say that f is Borel measurable whenever it is measurable as a 
function on the measure space (R, B(R), m). 
(ii) The integral i f dm, whenever it exists (including the cases when it 
is equal to co or —oo), is called the Lebesgue integral of f. 
(iii) When the integral f f dm is finite, we say that f is Lebesgue inte- 
grable. 


We need to make sure that Lebesgue integral is indeed what we are look- 
ing for, that is, it coincides with the Riemann integral of any continuous 
function over an interval. 


Proposition 1.39 
For any continuous function f : R > R and any numbers a < b 


b 
f Monal Far 
a [a,b] 


with the Riemann integral on the left-hand side and Lebesgue integral on 
the right-hand side. 


Proof It is enough to consider f > 0. Otherwise we can consider f* 
and f~ separately, and then combine the results. 

For any n = 1,2,..., take c; = a+(b—a)i2™ so that a = co < cı <+ < 
Co = b. For each i = 1,...,2”, since f is a continuous function, it has a 
minimum on [c;_;, ¢;], which we can write as f(x;) for some x; € [c;_1, cil. 
We put S„ = pa f(x(ci — ci-1). Then, by the definition of the Riemann 
integral, 


b 
T f(x)dx = lim S. 


On the other hand, 


1 


Sn = >) FEM + FO 


i=1 


is a non-decreasing sequence of simple functions, and lim)... Sn = f1[a,o)- 


1.4 Lebesgue integral 29 


Moreover, f s,dm = §,. By the monotone convergence theorem (Theo- 
rem 1.31), 


fdm= [Mien dm = lim ji S, dm = lim S ,, 
R n—-co R n-oo 


[a,b] 


completing the proof. Oo 


While the Lebesgue integral coincides with the Riemann integral for 
continuous functions integrated over intervals, it is in fact much more gen- 
eral and covers various other cases. Here are a couple of relatively simple 
examples. 


Example 1.40 
In Example 1.21 we saw that the Riemann integral ity J (x) dx does not exist 
when the function f is defined as 


fo ifxeQ 
wl ifxeR\O 


However, the Lebesgue integral i i f dm does exist and equals 1, as the 
function f is the indicator of the Borel measurable set [0, 1] \ Q, which has 
Lebesgue measure 1. 


Exercise 1.31 Recall the Cantor set C (Exercise 1.7). Suppose that 
f : [0,1] — R is defined by setting f(x) = O for all x in C, and 
f(x) = k for all x in each of the 2‘! intervals of length 3™* removed 


from [0,1] in forming Cx. Calculate the Lebesgue integral Se E fdm 


and show that the Riemann integral T f(x)dx does not exist. 


Exercise 1.32 For any integrable function f : R — [—ov, co] and any 
a € R show that the function g(x) = f(x — a) defined for all x € R is 


integrable and 
f gdm = f fdm. 
R R 


This is known as translation invariance of the Lebesgue integral. 


30 Probability spaces 


Hint. Refer to Exercise 1.8 concerning translation invariance for the 
Lebesgue measure m. 


More convergence results 


In addition to the monotone convergence theorem, there are other powerful 
results concerning limits of integrals. It will be important to have these 
ready in our toolbox. Once again we work with a general measure space 
(Q, F, u), and begin with the following two inequalities. 


Lemma 1.41 (Fatou lemmas) 
Let fa : Q — [0, œ] be measurable functions for n = 1,2,.... 
(i) The inequality 


f (imis) du < timint | fdu 
Q 


holds. 
(ii) If, moreover, f, < g for all n, where g : Q — [0, œ] is integrable, 
then 


imsup | fidu  (timsup f,) ay 
Q Q 


n—-oo n= 


Proof (i) Set g} = inf»s, fn. Then g% for k = 1,2,...is a non-decreasing 
sequence, and 


lim gk = i gk = sup inf = lim inf Th 
k21 


Moreover, 8y < fn and so f gdu < f fa du whenever k < n. Hence, for 


each k > 1 
[sca sing | fdu 
Q n>k Q 


Because g, for k = 1,2,... is a non-decreasing sequence, so is f gx du, 
and it follows by the monotone convergence theorem (Theorem 1.31) that 


{ (iminth)du= f (sim ge)du = tim f sidu =sup | seau 
Q n=% Q k-00 k=% Q kel Q 


< supinf | fidu =timint | f, du 
n-oo Q 


k> | nek 


(ii) Let h, = g — fn (where we set h,(w) = 0 for any w € Q such that 
g(W) = f,(w) = co). The functions A, : Q — [0, œ] are measurable for all 


1.4 Lebesgue integral 31 


n=1,2,..., and we can apply (i) to get 


f (1im inf ha) du < lim int f ieee 
Q n-oo n-oo Q 


Because g is integrable and O < fa < g for each n, it follows that fẹ is 
integrable for each n. Moreover, it follows that 0 < limsup,,,.. fr < g, and 
so lim sup,,_,.. fn is also integrable. As a result, by Exercise 1.27, 


{ (imine) ae = f(s - lim sup f,) du = fea- {( lim sup fy) d 
Q n=% n—oo 


on the left-hand side of the inequality, and 


timint | mdu = timint( | gdu- | fads) = | gdu- limsup | fad 
n=% Q n=% Q Q i366 


on the right-hand side, completing the proof. 


Example 1.42 

In general, we cannot expect equality in Lemma 1.41 (i). On the measure 
Goes (C8, (208), 7) lle jf = o TOP MS Zooo o Mren Mit. jin = 
0 because for any fixed real number x we can find n > x. Hence 
ff Giminf,... fr) dm = 0, while liminf,.. f fa dm = 1 since f f,dm = 
[Gtomeachtit—s lee TE 


Exercise 1.33 Let (Q,%,P) be a probability space. Use Fatou’s 
lemma to show that for any sequence of events A, € F, where 
n=1,2,..., we have 


lV () 4 < liminf P(A,). 


n>1 kən 


In situations when we need to integrate the limit of a non-monotone 
sequence of functions, the following result often comes to the rescue. 


Theorem 1.43 (dominated convergence) 
Suppose that fa : Q — [-œ, œ] are measurable functions for n = 1,2,... 
and there is an integrable function g : Q — [0,00] such that |f,| < g for 


32 Probability spaces 


each n. Suppose further that myo fn = f. Then f and f, for each n are 
integrable, and 


lim fidu= | fay (1.10) 
noo Q Q 


Proof Since |f,| < g for each n, it follows that |f] < g, where g is inte- 
grable. This means that f and f, for each n are integrable, and so are f — fn 
and |f — f,| because |f- fal < Ifl + |fal < 2g. The second Fatou lemma 
(Lemma 1.41 (ii)) gives 


timsup | 1f- f dus | (timsupif,~ flaw = 0 


n= n= 


since lim sup,,_,. fn — f| = im, lfa — f| = 0. This completes the proof 


because 
Sra- | raul =|fe-nau]s [16-A 
Q Q Q Q 
where Exercise 1.26 is used in the last inequality. m 
Example 1.44 


For an example with no integrable dominating function, consider the se- 
quence of functions on the measure space (R, 8(R), m) defined by fa = 
m1 (0,14 for n = 1,2,... . Here g = sup,,, Jn satisfies g = n on the in- 


i i 258 el Nee L = 
terval (1 so fo gdm = rein(? +) = Jr = ©. We have 


i e5 ff, = Orand k fadm = 1 for each n, which means that (1.10) fails in 
this case. 


Exercise 1.34 Let f, for n = 1,2,... be a sequence of integrable 
functions and suppose that >’? , Í |fnl duis finite. Show that the series 
d fa converges p-a.e., that its sum is an integrable function, and that 


[di s)o=d fia 


n=1 n=1 


Exercise 1.35 Use the previous exercise to calculate i dx. 


1.5 Lebesgue outer measure 33 


Exercise 1.36 Prove the following version of the dominated conver- 
gence theorem. 

Suppose we are given real numbers a < b and a function f : 
Q x [a,b] — R such that w œ> f(w,s) is measurable for each 
s € [a,b]. Suppose further that for some fixed t € [a,b] we have 
f(o,t) = lims f(w, s) for each w € Q, and there is an integrable 
function g : Q — R such that |f(w, s)| < g(w) for each w € Q and 
each s € [a,b]. Then 


T fw, Dduw) = lim I fow, s)du(w). 


1.5 Lebesgue outer measure 


Definition 1.45 
For any A C R we define 


co 


m*(A) = inf 5 Kh): Ac) | 
k=1 k=1 


where the infimum is taken over all sequences (J;)°, consisting of open 
intervals. This is called Lebesgue outer measure. 


The Lebesgue outer measure m* extends the function m defined on (R) 
by (1.2) to the family of all subsets of R. We have 


m(A) = m*(A) 


for each A € S(R). Despite its name, Lebesgue outer measure is not a 
measure on the subsets of R. 


Proposition 1.46 
The Lebesgue outer measure m* has the following properties: 
(1) m*(@) = 0; 
Gi) A C B implies m* (A) < m*(B) for any A, B C R (monotonicity); 
(iii) m* (U2, Ai) < Xo m*(A;) for any A; C R, i = 1,2,... (countable 
subadditivity); 
(iv) m*([a,b]) = m*((a,b)) = m*((a,b]) = m*([a,b)) = b - a for each 
a<b. 


34 Probability spaces 


Proof (i) Take J, = (a,a) = Ø fork = 1,2,... . This sequence covers Ø 
by open intervals with total length 0. It follows that m*(@) = 0. 
(ii) If A C Band B is covered by a sequence (J;),°. ,of open intervals, then 
A is covered by the same sequence, which implies that m*(A) < m*(B). 
(iii) To prove countable subadditivity, let € > 0 be given. For each i = 
1,2,... there is a sequence (J;,)°., covering A; by open intervals and such 
that 


oe} 


Y Wi) <m'(Ad) + 


k=1 


Summing over i, we have 


2 S11) < (may +S) = Xma yi 


oo 
i=l k= i=l 


mi 


But the double sequence (J;x);,-, covers UŽ; A; by open intervals, so by 
the definition of Lebesgue outer measure, 


m* Ù a < Y Ya) < Di mA) + e. 
i i=l k=l i=l 
This argument works for an arbitrary £ > 0, which proves the claim. 
(iv) Let a < b. Take any £ > 0, and put J; = (a — 5,b+ 5) and Jų = 
(a,a) = @ fork = 2,3,... . This sequence covers [a, b] by open intervals, 
so 


m (la, bl) < X Ud) =) = b-ate. 
k=l 
This is so for every € > 0, which implies that m*([a, b]) < b- a. 
To prove that, on the other hand, m*([a, b]) > b — a take any € > 0 anda 

covering (J;)°., of [a, b] by open intervals Jų = (cx, dk) such that 

X Uh) < m*(la, bI) + e. 

k=l 
For any x > a we say that [a, x] has a finite subcover whenever [a, x] C 
UŠ | Jų for some positive integer K. We are going to show that [a, b] has a 
finite subcover. To this end, define 


s = sup {x > a: [a,x] has a finite subcover}. 


Since a € J; for some k and J; is an open interval, we have [a,a + €] C Jk 
for some £ > 0, implying that s > a. Now suppose that s < b. Since 


1.5 Lebesgue outer measure 35 


(J,) 2, covers [a, b], we can find 7 such that s € J;, and as J; is open we can 
find xı, x2 € J; with a < x, < s < X2. Now [a, xı] has a finite subcover 
since xı < s. But that subcover, together with J;, gives a finite subcover of 
[a, x2], and since x, > s, this contradicts the definition of s. Therefore we 
cannot have s < b. It means that s > b. As a result, we have shown that 
[a, b] has a finite subcover (oe , for some positive integer K. 

Let 


c= min c, d= m 
k=1,...,K k=1,..., 


Because [a,b] c UŽ (cr d), we must have c < a < b < d. It follows that 


K 


K co 
b-a<d-c< Xa aye > Jy) < >, (Ja < m*({a, b]) + e. 
k=1 k=1 


k=1 


This must be so for every £ > 0, implying that b — a < m*([a, b]). We can 
conclude that b — a = m*([a, b]). 
Next, using (ii), (iii), (iv), we have 


m” ((a, b)) < m*([a, b]) = m*([a, a] U (a, b) U [b, b]) 
< mř*([a,a]) + m*((a, b)) + m*([b, b]) 
= (a — a) + mř*((a,b)) + (b — b) 
= m'((a,b)), 


so that m*((a, b)) = m*([a, b]). A similar argument shows that m*((a, b)) = 
m ([a, b)) and m*((a, b)) = m*((a, b)). o 


Definition 1.47 
A set A C R is said to be m*-measurable if 


m* (E) > mř*(E A A) + nč (E A (R\A)) (1.11) 

for every E C R. The collection of all m*-measurable sets is denoted by M. 

Remarkably, as we shall see, this property will suffice for countable ad- 
ditivity. 


Proposition 1.48 
The following properties hold: 
G) Misa -field on R; 
Gi) m* restricted to M is a measure; 
(iii) every interval (a, b) belongs to M, that is, T C M. 


36 Probability spaces 
Proof (i) We already know that m*(@) = 0, so for every E CR 
m? (E) = nč (E) + m*(@) = m (ENR) +m (E NA (R \ B), 


which means that R € M. Moreover, because A = R \ (R \ A), it follows 
from (1.11) that A € M implies R \ A € M. 
Now let A, B € M. For any E c R we have by (1.11) 
m*(E) > m*(E N A) + m*(E N (R \ A)) 
>m(ENANB)+m(ENAN(R\B)) 
tm (EO (R\A)N B) +m (EN (R\ A) N(R \ B)), 
where in the second inequality we use ENA and EN(R\A), respectively, in 
place of E in (1.11). Since AUB C (ANB)U(AN(R\ B))UC(R\A)NB), by the 
subadditivity of m*, the sum of the first three terms is at least m*(EN(AUB)). 
In the final term (R \ A) N(R \ B) =R\ (AUB). As a result, 
m (BE) >m (EN (AU B)) +m (EN (R \ (AU B))), 


which shows that A U B e M. We have shown that A,B € M implies 
AUB e M. By induction, this extends to any finite number of sets. If 
A; € Mfori=1,...,n, then UL, Ai E M. 

Finally, take any sequence A; E€ M for i = 1,2,..., and put D, = Uj, Ai 
and D = (J, Aj. It follows that D, € M, so for any E CR 

m*(E) > mř(E A D,) + m“ (E A (R \ D,)). 
Clearly D, C D, so R \ D, > R \ D, and by the monotonicity of m* 
m*(E N (R \ D,)) = m (E N(R \ D)). 


Next put B, = A; and B, = A, \ D,-; for n = 2,3,... . From what has 
already been shown it follows that B, € M for all n. Using E A D, in place 
of E in (1.11), we get 
m(END,) >m (E A D, O B,) +m (EOD, (R \ B,D 
= m*(E A B,) +m (EN D,-1). 


We can repeat this for m*(E N D;) with i = n— 1,n — 2,..., 1 and obtain 


m*(E N D,) > > m* (EN B). 


i=1 


It follows that 


m*(E) > > m*(E N B) + m*(E A(R \ D) 


i=1 


1.5 Lebesgue outer measure 37 


for each n, and so 


m'(E) > > m*(E NA B;) + m (E N(R \ D) (1.12) 


i=1 


> m*(EN s B) +m(EN(R\ D) 
i=] 


= m*(E N D)+m (EN (R \ D)), 


where the second inequality is due to the countable subadditivity of m*. 
We have shown that D = U2, A; E€ M, completing the proof that M is a 
o-field. 

(ii) We already know that m*(@) = 0. Let A; € M fori = 1,2,... bea 
sequence of pairwise disjoint sets. Then B; = A; for each i and (1.12) with 
E = D = |J} Ai gives 


m*(D) > > m’(A)). 
i=l 
Countable subadditivity, see Proposition 1.46 (iii), gives the reverse in- 
equality, proving that m* is countably additive, and hence it is a measure 
on M. 
(iii) Let £ > 0 be given. There is a sequence (J})% Of open intervals 
covering E such that m*(E) + € > i, m*(Jx). By subadditivity 


m*(E N (a,b)) + m*(E N (R \ (a,b))) 

< m*(E N (a, b)) + m* (E N (—%,a]) + m*(E N [b, 09)) 

< $ [m Je A (a, b)) + m (Je N (=00, all) + m" (Ji N [b, œ))]. 
k=1 


Inside the square brackets we have the lengths of the disjoint intervals Jų N 
(a, b), J; N (—%,a], Jk N [b, œ), which add up to give the length /(J,) of Jp. 
Hence 


m*(E 1 (a, by) + m“(E N (R \ (a, b))) < 3 KJ) < m'(E) + €. 


k=1 


Since this holds for all £ > 0, we have shown that (1.11) with A = (a,b) 
holds for every E c R, which means that (a, b) € M. It follows that 7 c M. 
Oo 


Finally, we are ready to prove Theorem 1.19. 


38 Probability spaces 


Theorem 1.19 
m : B(R) > [0, œ] is a measure such that m((a, b)) = b — a for alla < b. 


Proof Because Mis ao-field and 7 c M, we know that B(R) c M (see 
Exercise 1.3). Because m* is a measure on M and m(A) = m*(A) for every 
A € B(R), it follows that m is a measure on A(R). Moreover, for any a < b, 
since (a, b) € B(R), we have m((a, b)) = m*((a, b)) = b — a. o 


2 


Probability distributions and random 
variables 


2.1 Probability distributions 

2.2. Random variables 

2.3 Expectation and variance 

2.4 Moments and characteristic functions 


We again motivate our discussion through simple examples of pricing mod- 
els. In such applications, we often have information about the probability 
distribution of future prices, so it is natural to begin our analysis with dis- 
tribution functions and densities. Then we look at measurable functions 
defined on probability spaces, commonly known as random variables, and 
the probability distributions associated with them. One often has to infer 
the structure of the distribution from simpler data, such as the expectation, 
variance or higher moments, and methods for computing these thus play 
a major role. Finally, we introduce characteristic functions as a vehicle 
for analysing distributions and computing the moments of a given random 
variable. The full power of characteristic functions will become apparent 
in Chapter 5, where it will be shown that the characteristic function of a 
random variable determines its distribution. 


2.1 Probability distributions 


In Chapter 1 we looked at some examples of probabilities. These included 
the uniform, binomial, Poisson and geometric probabilities in a discrete 
setting, and the probabilities associated with the normal and log-normal 
densities in a continuous setting. These examples can be revisited using 


39 


40 Probability distributions and random variables 


-2 0 2 4 6 8 10 12 


Figure 2.1 Distribution function for the binomial distribution in Example 2.2 
with N = 10 and p = 0.5. 


the notions of probability distribution and distribution function, which can 
also be used to describe a multitude of other useful probabilities in a unified 
manner. 


Definition 2.1 

A probability distribution is by definition a probability measure P on R 
defined on the o-field of Borel sets A(R). The function F : R — [0,1] 
defined as 


F(x) = P((-00, x]) 


for each x € R is called the (cumulative) distribution function. 


Example 2.2 
Let 0 < p < 1 and fix N = 1,2,... . For any A € S(R) let 
N 
N 
P(A) = z Jra — p)*"1a(n). 
n=0 Ms 


This defines a probability measure on R, called the binomial distribution 
with parameters N, p. It corresponds to the binomial probability defined in 
Example 1.13. The corresponding distribution function is piecewise con- 
stant: 


0 fory <0! 
F(x) =4 Ead- p fork =0,1,...,N-landk<x<k+1, 
1 for N < x. 


This function is shown in Figure 2.1. The dots represent the values F(x) at 
x =0,1,...,10, where the distribution function has discontinuities. 


2.1 Probability distributions 41 


-2 0 2 4 6 8 10 12 


Figure 2.2 Distribution function for the Poisson distribution in Example 2.3 
with 2 = 2. 


Example 2.3 
Let A > 0. The probability measure on R defined by 
P(A) = F Ya (n) 
Á dn! i 
for any A € S(R) is called the Poisson distribution with parameter 4. 
Compare this with the Poisson probability in Example 1.8. The correspond- 
ing distribution function 


ree 0 for x <0; 
US) yt Me fork =0,1,... andk<x<k+1 


n=0 n! 


is depicted in Figure 2.2. The dots represent the values F(x) atx =0,1,..., 
where the distribution function has discontinuities. 


Definition 2.4 
We say that a probability distribution P is discrete whenever there is a 
sequence x1, X2,... € R such that X>; P({x,}) = 1. 


Example 2.5 
The binomial distribution and the Poisson distribution are examples of dis- 
crete probability distributions. 


According to Theorem 1.35, if f : R — [0, co] is Borel measurable, then 
the integral i f dm, considered as a function of B € B(R), is a measure. If, 
in addition, f f dm = 1, then this is a probability measure. We have seen 


42 Probability distributions and random variables 


Figure 2.3 Distribution function for the normal distribution in Example 2.8 
(solid line) and for the log-normal distribution in Example 2.9 (broken line). 


two examples of such functions f, the normal density (Example 1.23) and 
the log-normal density (Example 1.24). 


Definition 2.6 
Any Borel measurable function f : R — [0,00] such that if fdm= lis 
called a probability density. 


When f is a probability density, then 


P(B) = f fdm 
B 


defined for any Borel set B € (R) is a probability distribution with distri- 
bution function 


F(x) = P((-09, x]) = f dm. 
(—00,x] 
Definition 2.7 
We say that a probability distribution P is continuous (sometimes referred 
to as absolutely continuous) if there is a density function f such that for 
each B € B(R) 


p(B) = | fam 
B 


Example 2.8 

The normal distribution is the probability distribution corresponding to 
the normal density specified in Example 1.23. See Figure 2.3 for the distri- 
bution function in this case. 


2.1 Probability distributions 43 


Example 2.9 

The probability distribution with the log-normal density in Example 1.24 is 
referred to as the log-normal distribution. The corresponding distribution 
function is shown as a broken line in Figure 2.3. 


Example 2.10 
The density 


me zA 


foo={ 4 if x <0, GD 


yields the so-called exponential distribution with parameter 4 > 0. The 
corresponding distribution function is given by 


l= izz 
oa ifx <0, 


and shown in Figure 2.4. 


The following proposition lists some properties shared by all distribution 
functions. 


Proposition 2.11 
The distribution function F of any probability distribution has the following 
properties: 
(i) F(x) < FQ) for every x < y (F is non-decreasing); 
(ii) limsa F(x) = F(a) for eacha € R (F is right-continuous); 
(iii) lim,,-~ F(x) = 0; 
(iv) lim,.,.. F(x) = 1. 


Proof Let F be the distribution function of a probability distribution P, 
that is, let F(x) = P((—cx, x]) for each x € R. 

(i) Note that x < y implies (—09, x] c (—2, y], so P((—2, x]) < P(—09, y]) 
by Theorem 1.11 (iii). 

(ii) Take any non-increasing sequence of numbers x, such that x, > 
a and lim, 50. Xn = a. Then (—o0,a] = N}; (—%, x,], and P((—09, a]) 
P (NR (—09, Xn) = lim, P((—09, a„]) by Theorem 1.11 (vii). 

(iii) If x, is a non-increasing sequence of numbers such that lim). Xn = 


44 Probability distributions and random variables 


Figure 2.4 Distribution function for the exponential distribution in Exam- 
ple 2.10 with A = 1. 


—oo, then Ø = N21, Xn], 800 = P(r (— © Xn]) = limp so. P(—09, Xn) 
once again by Theorem 1.11 (vii). 


(iv) Using Theorem 1.11 (v) with B, = (—co,x,], where x, is a non- 
decreasing sequence such that lim, oo X, = œ, we have R = U? (—09, xn], 
and so 1 = P(UȘ (0%, Xn]) = limps. P((—09, xn]). Oo 


In general, F does not have to be continuous. Because F is non-decreasing, 
the left limit F(a—) = lim, 7, F(x) exists for each a € R. We have 


o 


P({a}) = e(a -1, ai = lim P((a- Ł,a]) 


n=1 


= lim (F@ - F(a-+)) = F(a) - Fa). 


If F has a discontinuity at a, then F(a—) < F(a), and so P({a}) > 0. On the 
other hand, if F is continuous at a, then F(a—) = F(a) and P({a}) = 0. 


Exercise 2.1 Suppose F is a distribution function. Show that there 
are at most countably many a € R such that F(a—) < F(a). (We say 
that F has at most countably many jump discontinuities.) 


2.1 Probability distributions 45 


Example 2.12 

Suppose that F is a piecewise constant distribution function with a finite 
number of jumps at points a) < a < --- < dy with F(a,) — F(an—) = Pn, 
where X^; Pn = 1. So 


0 for x < ay, 
FG) =4 Yop. fork = looo = land a = EE Cre 
1 for ay < x. 


The corresponding probability distribution satisfies P({a,}) = p, for each n, 
and P({a,,...,ay}) = 1. It is a discrete probability distribution concen- 
trated on a finite set. Example 2.2 falls into this category. 


Example 2.13 
In particular if N = 1 and a, = a for some a € R, we have 


IAGO) = Tir ED 


The corresponding probability distribution is the unit mass concentrated 
at a (see Example 1.12), 

1 ifaeA, 
aya) ={ 0 ifaé€A, 


for any Borel set A € A(R). 


Example 2.14 
Let Fiog-norm denote the log-normal distribution function from Example 2.9. 
Fix a number 0 < p < 1 and write 


0 for x < 0, 


Re) Sane ees for x > 0. 


This is a distribution function, with a jump of size p at 0. The resulting 
probability distribution is 
Ps Poo ar (ad = P) Pios nom., 


where ôo is the unit mass concentrated at 0, and where Piog-norm is the log- 


46 Probability distributions and random variables 


Figure 2.5 Distribution function in Example 2.14 with p = 0.3. 


normal distribution with density given in Example 1.24. Here we have a 
mixture of discrete and continuous distributions. It can be viewed as a 
model of the stock price for a company which can suffer bankruptcy with 
probability p. The graph of F with p = 0.3 is shown in Figure 2.5. The dot 
represents the value F(x) at x = 0, where the distribution function has a 
discontinuity. 


Exercise 2.2 Let P4, P2,... be probability measures defined on a o- 
field F, and let a), a2,... > 0. What condition on the sequence a, is 
needed to ensure that P = X>; a,P,, is a probability measure? 


Remark 2.15 

The converse of Proposition 2.11 is also true. If F : R — [0,1] satisfies 
conditions (i)—(iv) of this proposition, then F is the distribution function of 
some probability distribution P. This provides a simple and convenient way 
to describe a probability distribution by specifying its distribution function. 


2.2 Random variables 


Derivative securities (also called contingent claims) play an important 
role in finance. They represent financial assets whose present value is de- 
termined by some future payoff. In the case of European derivative secu- 
rities the payoff is available at a prescribed future time T and depends on 


2.2 Random variables 47 


the value S(T) at that time of a stock or some other risky asset, called the 
underlying security. We can write the payoff as H = h(S(7)) for some 
function h. For example, with h(x) = (x — K)* we have a (European) call 
option, while A(x) = (K — x)* gives a put option. Here K is called the 
strike price. 

An important step in studying derivative securities is to build a model 
of the future values of S(T). Various such models have been proposed. 
Suppose that we have chosen a probability space (Q, F , P) where the set Q 
represents all possible values of S(T). It is natural to ask for the probability 
P({H e€ B}) that the payoff H takes values in some Borel set B € B(R), 
for example in an interval B = [a,b]. The set {H € B} should belong to 
the domain of P (which is F, the o-field of events). Functions with this 
property were called measurable in Definition 1.27. When working with 
probability spaces we call them random variables. 


Definition 2.16 
A random variable is a function X : Q — R such that for each Borel set 
Be BR) 


{Xe BleF. 
The family of all sets of the form {X € B} for some B € H(R) is denoted 


by o(X) and is called the o-field generated by X. We can see that X is a 
random variable if and only if o(X) c F. 


Exercise 2.3 Show that o(X) is indeed a o-field. 


Ifh : R > Rand X : Q — R, we shall often write A(X) to denote 
the composition h o X. In fact this has already been done implicitly in the 
expression h(S(T)) above. 

If h is a Borel measurable function, then A(X) is measurable with respect 
to o(X) since h~! (B) is a Borel set for any B € BR), and so 


(A(X) € B} = {X €h(B)} € o(X). 


In particular, when X is a random variable, it follows that A(X) is also a 
random variable. 

It turns out that all functions measurable with respect to o (X) are of the 
form h(X) for some Borel measurable function h. 


48 Probability distributions and random variables 


Exercise 2.4 Show that Y : Q — R is measurable with respect to 
o(X) if and only if there is a Borel measurable function h : R > R 
such that Y = h(X). 


Example 2.17 
The payoff functions of derivative securities provide many examples of ran- 
dom variables. For European options we can express the payoff in the form 
h(S(T)), so we need only to show that the function A is Borel measurable. 
Familiar examples are call or put options with strike K, whose payoff func- 
tions are h(x) = (x — K)* and h(x) = (K — x)* respectively. Other popular 
options include the following. 
(1) A bottom straddle, which consists of buying a call and a put with 
the same strike K, so that the payoff is h(x) = |x — KI. 
(2) A strangle, where we buy a call and a put with different strikes Kı < 
K3, so that 


h(x) = (x — K,)* + (Ky -— x)*. 


This reduces to a straddle when K, = K3. 
(3) A bull spread, consisting of two calls, one long and one short, with 
strikes Kı < K2, so that 


0 ifx< Kk, 
MSG- (eK) Sy a EK SS Kp, 
K,- K; if x > K. 


(4) A butterfly spread, where we buy calls at strikes K; < K; and sell 
two calls at strike K) = 5(K + K3). You should verify that the payoff 
is zero outside (K,, K3) and equals x — K; on [K,, K2] and K; — x on 
[K2, K3]. 

In all these cases the Borel measurability of h follows at once from the 
exercises in Chapter 1, so the payoffs are random variables. 


Definition 2.18 
With every random variable X we can associate a probability distribu- 
tion Py, called the distribution of X, defined as 


Px(B) = PX € B}) 


2.2 Random variables 49 
for any Borel set B € (R). The corresponding distribution function 
F’x(x) = Px((-09, x]) 


defined for each x € R is called the (cumulative) distribution function of 
the random variable X. 


Since we will frequently work with probabilities of the form P({X € B}, 
from now on we condense these to P(X € B) for ease of notation when there 
is no risk of ambiguity. 


Exercise 2.5 Let X be a random variable modelling a single coin 
toss, with two possible outcomes: 1 (heads) or —1 (tails). For a fair 
coin, the probabilities are P(X = 1) = P(X = -1) = F, Sketch the 
distribution function Fy. 


Exercise 2.6 Let X be the number of tosses of a fair coin up to and 
including the first toss showing heads. Compute and sketch the distri- 
bution function Fy. 


Exercise 2.7 Suppose that X, for each n = 1,2, ...is a random vari- 
able having the binomial distribution with parameters n, p (see Exam- 
ple 2.2), where p = 2 for some A > 0. Find Py,(k) for k = 0,1..., and 
determine lim,—,.0 Px, (k). 


The two classes of random variables that will mainly concern us can be 
distinguished by the nature of their distributions. 


Definition 2.19 
We say that a random variable X is discrete if it has a discrete probability 
distribution Py, that is, if there is a sequence x), x2,... € R such that 


Senen 


n=1 


Definition 2.20 
We say that a random variable X is continuous if there exists an integrable 


50 Probability distributions and random variables 


function fy : R — [0, œ] such that for every Borel set B € B(R) 


Px(B) = | fam 
B 


We call fy the density of X. 


In these cases the distribution function Fy can be expressed as follows: 


for any yE R 
(i) if X is discrete, 


FxO) = J pn, where py = Px({%n}); 


Xn Sy 


(ii) if X is continuous, 


Fx) = fx din. 


(-00,y] 


Example 2.21 

In Example 1.9 we saw that p, = (1 — p)""'p (where 0 < p < 1) defines a 
probability on Q = {1,2,...}. This gives rise to a random variable X(n) = n 
for n = 1,2,... with geometric distribution Px({n}) = P(X = n) = pn. 
Since }}°, Pn = 1, it is clear that X is a discrete random variable. 


Exercise 2.8 As in Example 1.9, we can think of 1,2,... as trading 
dates, and regard p and 1 — p as the probabilities of upward/downward 
price moves on any given trading date. Let Y be the number of trading 
dates needed to record r upward price moves. Show that Py({n}) = 
(=) p(l- p)"™" forn =r,r+1,...,and verify that this is a probability 


r-1 


distribution. It is called the negative binomial distribution. 


Example 2.22 
We say that a random variable X has normal distribution if the probability 
density takes the form introduced in Example 1.23: 


il a Gy)? 
202 
k 


Fx(x) = 


e 
oV2n 


for some u € R and ø > O. In this case we say simply that X has the 


2.2 Random variables 51 


N(u, o°) distribution, abbreviated to X ~ N(u, o°). The corresponding dis- 
tribution function is 


Fy(x) = f l o dy. 
-œ O V27 
Since fx(x) > 0 for all x, it follows that Fy : R —> [0, 1] is strictly increas- 
ing, hence invertible. 
In particular, if u = 0 and o = 1 we use the notation N(x) for Fx(x), 
and say that the random variable X ~ N(0, 1) has the standard normal 
distribution. 


Exercise 2.9 Given a random variable X with standard normal dis- 
tribution N(0, 1), let Y = u + oX for some u, o € R such that o > 0. 
Show that Y has the normal distribution N(u, o°). 


Example 2.23 

With X ~ N(u, o°), write Y(x) = Ei): This function is a random vari- 
able on the probability space [0, 1] with uniform probability given by the 
Lebesgue measure m. The random variable Y has the same normal distri- 
bution N(u, 07) since 


Fy(a) = m({x € [0,1] : Y(x) < a}) = m({x € [0,1]: Hee) < a}) 
= m({x € [0,1] : x < Fx(a)}) = m([0, Fx(a)]) = Fx(a). 


The distributions (and the densities, when they exist) of simple algebraic 
functions of X can often be found directly. 


Example 2.24 
Let X be a random variable and let Y = aX + b for some a + 0 and b € R. 
We find the distribution function Fy in terms of Fy. For a > 0 


F(a) = Pla +b < ») = P(X < z) = a2) 


52 Probability distributions and random variables 


On the other hand, for a < 0 


F(a) = Plax +b < ») = P(X> = 
=1-P|x< = 
a 


=} 
=i im Fxo)=1- Fy(" -). 
y a 


Exercise 2.10 Suppose that X has continuous distribution with den- 
sity fx, and let Y = aX + b for some a,b € R such that a + 0. Show 


that 
jl —b 
fled = Ee (® | 


lal a 


Exercise 2.11 Suppose that X is a random variable having uniform 
distribution on the interval [-1, 1], i.e. such that the density of X is 


fx = $1,-1,1). Find the distribution function of Y = ż. 


Example 2.25 

We say that a random variable Y > 0 has log-normal distribution if X = 
In Y has normal distribution. We shall find the density of Y. Take any 0 < 
a < b, which is sufficient because Y = e* takes only positive values, and 
employ the normal density fx: 


Plas Y<b)=P(Ina<xX <Inb) 


i Il a d f 1 re 

= ee ax: e w Y, 

na O V27 a yo V2n 

where we make the substitution x = Iny in the integral. The probability 


density of Y has the familiar form of the log-normal density from Exam- 
ple 1.24. 


2.2 Random variables 53 


Exercise 2.12 Suppose that X is a random variable with known den- 
sity fx. Find the density of Y = g(X) if g : R — R is continuously 
differentiable and g'(x) + 0 for all x € R. 


Application to stock prices and option payoffs 


If X is a random variable with standard normal distribution N(0, 1) defined 
on any probability space, then uT + o VTX has the normal distribution 
N(uT, o°T) (see Exercise 2.9), and 


S(T) = S (0)etT +7 YTX (2.2) 


can be used as a model of log-normally distributed stock prices. 


Example 2.26 

If we want to represent the random future stock price S(T) on a concrete 
probability space, we can a example take Q = R with P given by P(B) = 
i fdm, where f(x) = TRe -¥°/2 ig the standard normal density. Then X 
such that X(w) = w for each w € R is random variable on (R, B(R), P) 
with the standard normal distribution N(0, 1), and S(T) given by (2.2) is a 
log-normally distributed random variable defined on this probability space. 


Various choices of the probability space can lead to the same distribu- 
tion. There is no single universally accepted probability space, allowing 
much flexibility in selecting one to suit particular needs. 


Example 2.27 

Let X be a random variable with standard normal distribution N(0, 1) de- 
fined on the unit interval Q = [0, 1] with Lebesgue measure m as the uni- 
form probability. Such a random variable was constructed in Example 2.23. 
Then the log-normally distributed stock price S(T) modelled by (2.2) will 
also be a random variable on Q = [0, 1]. This lends itself well to applying 
a numerical technique known as Monte Carlo simulation. An approxima- 
tion of the log-normal distribution function generated in this way is shown 
in Figure 2.6. Here T = 1, the parameters u and o of the log-normal dis- 
tribution are as in Example 1.24, and a sample of 100 points drawn from 


54 Probability distributions and random variables 


Figure 2.6 Monte Carlo simulation of the log-normal distribution function. 


[0, 1] is used. For comparison, the dotted line shows the exact distribution 
function. 


The payoff H = h(S(T)) of a European derivative security is a random 
variable defined on the same probability space as S(T). 


Example 2.28 
The distribution function Fy of the call option payoff H = (S(T) — K)* can 
be written as 


Fy(x) = P{(S(T)- K)* <9} 
7 0 i g0 
| PITKA i zA 
_ 0 it ae <0) 
a Fsi (K t x) if ge © 


in terms of the distribution function F's 7) of S(T). 

In Figure 2.7 we can see the distribution function Fy for a call option 
with expiry time T = 1 and strike price K = 8 written on a log-normally 
distributed stock given by (2.2) with parameters u and o as in Exam- 
ple 1.24. The dot indicates the value Fg(0) = Fs(:)(K) at x = 0, where 
Fy has a discontinuity. For comparison, the log-normal distribution func- 
tion Fsa) is shown as a dotted line. 


2.2 Random variables 55 


-4 -2 0 2 4 6 K=8 10 12 14 16 18 20 


Figure 2.7 Distribution function for the call option payoff in Example 2.28. 


Exercise 2.13 Find and sketch the distribution function of the payoff 
H = (K — S(T))* of a put option with expiry time T = 1 and strike 
price K = 8 written on a stock having a log-normal distribution with 
parameters u and o as in Example 1.24. 


As a simple alternative to the log-normal model, we consider discrete 
stock prices by revisiting and extending Example 1.7. Suppose that the 
initial stock price is positive, S (0) > 0, and assume that there are N mul- 
tiplicative price changes by a factor 1 + U or 1 + D (where -1 < D < U) 
with respective probabilities p and 1 — p (with 0 < p < 1), so that at a fu- 
ture time T > 0 the stock price S(T) will reach one of the possible values 
S(0)(1+U)"(1+D)"™ with probability (") p"(1— p)* forn = 0,1,...,N; 
see [DMEM]. 


Example 2.29 

A simple choice of the probability space for such a discrete model is Q = 
{0, 1,..., N} equipped with the binomial probability; see Example 1.13. 
The future stock price can be considered as a random variable on Q defined 
by S(T)(w) = S(O) + U)° + D=” for each w € {0,1,...,N}. The 
payoff H = h(S(T)) of any European option is a function of the stock price 
S(T) at expiry time 7, so it can also be considered as a random variable on 
O = O lyas NI. 


56 Probability distributions and random variables 


Example 2.30 

Next we turn our attention to path-dependent options, whose payoff de- 
pends not just on the stock price S(T) at expiry time T but also on the stock 
prices S(t) at intermediate times 0 < t < T. In the discrete model the time 
interval is divided into N steps O < tı < --: < ty = T. An arithmetic 
Asian call with payoff 


i 
a=(5 9-4] 


can serve as an example of a path-dependent option. It is a call option on 
the average of stock prices sampled at times t,..., ty. 

To describe H as a random variable we need a richer probability space 
than that in Example 2.29. Let us take © to be the set of all sequences of 
length N consisting of symbols U or D to keep track of up and down stock 
price moves. To any such sequence w = (w1,..., wy) € Q we assign the 
probability 


Ro pol py 


where ky is the number of occurrences of U and N — ky is the number of 
occurrences of D in the sequence w. For any n = 1,..., N we then put 


S (tw) = SOA + UY" + DY, 


where k, is the number of occurrences of U and n — k, is the number of oc- 
currences of D among the first n entries in the sequence w = (w),...,Wy) € 
Q. This is the binomial tree model studied in [DMFM]. 


2.3 Expectation and variance 


The probability distribution provides detailed information about the values 
of a random variable along with the associated probabilities. Sometimes 
simplified information is more practical, hence the need for some numeri- 
cal characteristics of random variables. 

For a discrete random variable X such that °°, p, = 1 with p, = P(X = 


2.3 Expectation and variance 57 


Xn) for some x1, X2, ... € R, the expectation of X is the weighted average 


o0 


E(X) = ) | XaPn- 


n=! 


In particular, in the case of a finite Q = {x,,..., xy} with uniform probabil- 
ity we obtain the arithmetic average 


1 N 
EX) = b 


Exercise 2.14 Find a discrete random variable X such that the ex- 
pectation of X is undefined. 


Exercise 2.15 A random variable X on Q = {0, 1, 2,...} has the Pois- 
son distribution with parameter 4 > 0 if pa = ett forn =0,1,2,...; 
see Example 2.3. Find E(X). 


The above definition of the expectation is familiar in another context: 
a non-negative discrete random variable X is nothing other than a simple 
function, as in Definition 1.25, with A, = {X = x,} and p, = P(A,) for 
n=1,...,N. Its integral can therefore be written as 


N 
XdP = Ñ xpa = E(X). 
1 DP 


n=1 


We use the fact that the expectation and integral coincide in this simple 
case to motivate the general definition. 


Definition 2.31 
The expectation E(X) of an integrable random variable X defined on a 
probability space (Q, F, P) is its integral over Q, that is, 


E(X) = f X dP. 
Q 


We can immediately deduce the following properties from Exercises 1.27, 
1.28 and 1.26, respectively. 


Proposition 2.32 
If X and Y are integrable random variables on (Q,F ,P) and a,b € R, 
then: 


58 Probability distributions and random variables 


(i) E(aX + bY) = aE(X) + DE(Y) (linearity); 
Gi) if X < Y, then E(X) < E(Y) (monotonicity); 
(iii) [E(X)| < EXD. 
For a general X it may not be obvious how to calculate the expecta- 
tion directly from the definition. However, the task becomes more tractable 


when we examine the relationship between integrals with respect to P 
and Py. 


Theorem 2.33 
If X : Q > Ris a random variable on (Q,F,P) and g : R > Ris 
integrable under Px, then 


ae) = | 8004P= | gaPx 
Q R 


Proof This follows immediately from Proposition 1.37, in which we take 
(Š. F, ñ) = R, 8R), Px) and y = X. o 


It will sometimes prove helpful to be explicit about the variable of inte- 
gration, so we allow ourselves the freedom to write 


[ xap= f xar 
Q Q 
[sare= [sears 
R R 


as the need arises. As a special case, for g(x) = x we obtain 


or 


E(X) = f xdPx(x). 
R 


This formula enables us to compute the expectation when the distribution 
of X is known. 

In particular, if X has continuous distribution with density fy, then 
P x(B) = f fx dm, where m is the Lebesgue measure, and we can conclude, 
with g and X as in Theorem 2.33, that 


E(g(X)) = f 80) fx(x) dmx). (2.3) 
R 


Exercise 2.16 Prove (2.3), first for any simple function g, then using 
monotone convergence for any non-negative measurable g, and finally 
for any g integrable under Py. 


2.3 Expectation and variance 59 


In particular, whenever g(x) = x is an integrable function under Py, we 
have 


E(X) = TEZO dm(x). 
R 


Example 2.34 

If X ~ N(0,1), then E(X) = like xfx(x) dx by Proposition 1.39, since the 
density fx(x) = ae is a continuous function on R. We compute the 
expectation 


E(X) = te xfx(x) dx = = Me xei dx =0. 


The result is O since the integrand g(x) = xe~2” is an odd function, that is, 
g(—x) = —g(x), so that the integrals from —co to 0 and from 0 to œ cancel 
each other. 


Exercise 2.17 Let X ~ N(u, o°) for some u € R and o > 0. Verify 
that E(X) = yp. 


Exercise 2.18 Let X have the Cauchy distribution with density 


1 1 
fx@) = = 


mlx 


Show that the expectation of X is undefined. 


Example 2.35 
Suppose that Px = pô» + (1 — p)P, where P has density f. Then E(X) = 
px + (1 = p) f xf(x)dm(x). (Compare with Example 2.14.) 


Exercise 2.19 Find E(X) for a random variable X with distribution 
Py = +60 + åP, where P is the exponential distribution with parameter 
A = 1, see Example 2.10. 


60 Probability distributions and random variables 


The expectation can be viewed as a weighted average of the values of a 
random variable X, with the corresponding probabilities acting as weights. 
However, this tells us nothing about how widely the values of X are spread 
around the expectation. Averaging the (positive) distance |X — E(X)| would 
be one possible measure of the spread, but the expectation of the squared 
distance (X — E(X)? proves to be much more convenient and has become 
the standard approach. 


Definition 2.36 
The variance of an integrable random variable X is defined as 


Var(X) = E(X - E(X))’]. 


(Note that the variance may be equal to co.) The square root of the variance 
yields the standard deviation 


Ox = VVar(X). 


When X is discrete with values x,, and corresponding probabilities p, = 
P(X = x,) for n = 1,2,..., the variance becomes 


Var(X) = (tn — EX? Pr» 


n=1 


revealing its origins as a weighted sum of the squared distances between 
the x, and E(X). 


Exercise 2.20 Show that the variance of a random variable with 
Poisson distribution with parameter 4 is equal to J. 


The linearity of expectation implies that 


Var(X) = E(X - E(X))’) 
= E(X? — 2XE(X) + E(X)’) = E(X’) - E(X)’, 
hence 
Var(aX + b) = @ Var(X), Cax+p = la| Cx. 


By Theorem 2.33 with g(x) = (x — E(X))’, 


Var(X) = { (x — E(X))? dP x(x). 
R 


2.3 Expectation and variance 61 


If X has a density, we obtain 


Var(X) = 1 (x — EXP fe dmo). 
R 


Example 2.37 

We saw that if X has the standard normal distribution, its expectation is 0. 
So Var(X) = I a xX fx(x) dx = 1, as is easily seen using integration by 
parts. 


Exercise 2.21 Compute the variance of X ~ N(u, o°). 


This shows that for a normally distributed X the shape of the distribution 
is fully determined by the expectation and variance. 


Exercise 2.22 Compute E(X) and Var(X) in the following cases: 
(1) X has the exponential distribution with density (2.1). 
(2) X has the log-normal distribution with density (1.7). 


The are several useful inequalities involving expectation and/or variance 
that enable one to estimate certain probabilities. Here we present just two 
simple examples. 


Proposition 2.38 (Markov inequality) 
Let f : R — [0, œ) be an even function, that is, f(—x) = f(x) for any 
x € R, and non-decreasing for x > 0. If X is a random variable and c > 0, 
then 
ECX) 

Ia - 
Proof Define A = {|X| = c}. It follows that f(X) > f(c) on A. Since f is 
an even function, we have f(X) = f(|X|), so 


P(|X| >) < 


E(f(X)) = EXD) = f sanar 2 f faxar > f(c)P(A). 
Q A 
O 


In particular, we can apply Proposition 2.38 with f(x) = x° to obtain the 
following inequality. 


62 Probability distributions and random variables 


Corollary 2.39 (Chebyshev inequality) 


2 
P(IX| >) < a 
Cc 


For X with finite mean u and variance o? 


inequality to |X — u| and c = ko to obtain 


we can apply Chebyshev’s 


1 
P(X -yl 2 ko) < zz 


Thus, if X has small variance, it will remain close to its mean with high 
probability. 


2.4 Moments and characteristic functions 


Definition 2.40 
For any k € N we define the kth moment of a random variable X as the 
expectation of X*, that is, 


m; = E(X"), 
and the kth central moment as 
Tr = E(X - E(X))*). 

Clearly, m, = E(X) is just the expectation of X. Moreover, 7, = 0 and 
oa = Var(X). 

We can ask whether a single function might suffice to identify all the 
moments of a random variable X. It turns out that the expectation of e”* 
does the job. To make sense of such an expectation we define the integral 
of a function f : Q — C with values among complex numbers by means 


of the integral of its real and imaginary parts: if f = Ref + ilmf and both 
the (real-valued) functions Ref and Imf are integrable, we set 


[tar f RefaP+i f mpar, 


In other words, we set 
E(f) = E(Ref) + iE(mf). 


Definition 2.41 
Let X : Q — R be a random variable. Then ¢y : R > C defined by 


x(t) = E(e®) = E(cos(tX)) + iE(sin(tX)) 


2.4 Moments and characteristic functions 63 
for all ż € R is called the characteristic function of X. 


To compute ¢y it is sufficient to know the probability distribution of X: 


ox(t) = f el dP x(x), 
R 


and if X has a density, this reduces to 
xD = |e" f(x) dma). 
R 


Vice versa, it turns out (see the inversion formula, Theorem 5.41) that the 
probability distribution of X is uniquely determined by the characteristic 
function dy. 

The function ¢y has the advantage that it always exists because the ran- 
dom variable e’“ is bounded. We begin by stating the simplest properties 
of ¢y and give several examples as exercises. 


Exercise 2.23 Show that for any random variable X 
(1) $x(0) = 1; 
(2) |dx(H| < 1 for allt € R. 


The characteristic function øx(t) is continuous in f¢. In fact 


lox(t +h) — ps f je" — 1 dP yx), 
R 


so it follows that ¢x(f) is uniformly continuous. 


Exercise 2.24 Let X have the Poisson distribution with parameter 
A > 0. Find its characteristic function. 


Exercise 2.25 Verify that if X is a random variable with the standard 
normal distribution, then ¢x(t) = ei}, 


Exercise 2.26 Let Y = aX + b, where X, Y are random variables and 
a,b € R. Show that for all t € R 


y(t) = e” ġx(at). 


64 Probability distributions and random variables 


Use this relation to find øy when Y is normally distributed with mean u 
and variance o°. 


As hinted above, there is a close relationship between the characteristic 
function and the moments of a random variable. 


Theorem 2.42 
Let X be a random variable and let n be a non-negative integer such that 
E(\X|") < œ. Then 


E(X”) = 1 (n) 0 
X") = 59)" (0). 
Proof First observe that for any x € R 
č-1-ir=i | e" — 1)ds. 
(e-1) 
Estimating the integral gives the inequality 


-i-i f (=i) as] f 


We show by induction that 
oy O = E(iX)"e™) 


e — I|ds < 2 |x|. (2.4) 


for every random variable X such that E(|X|") < oo. For n = O this is 
trivially satisfied: by (2) = ¢x(t) = E(e"*). Now suppose that the assertion 
has already been established for some n = 0,1,2,..., and take any random 
variable X such that E(|X|"*') < oo. It follows that 
E(X) = Eix XI") + EQ yxy IXI”) 
< 1 + Edy IXI) < 1 + E(X") < o. 


By the induction hypothesis we therefore have 


Oath) - 6) 
h 


: ay eX — 1 — ihX 
_ E((ix)"*'e"*) =E (ixyex = ; l 


þes 


By (2.4) the random variables 


; ihX _ ] — ihX 
Yh = i a AM 


are dominated by 2 |X|"*!. The derivative of the exponential function gives 
eXL] 


limpso 5 = iX, hence lim,>oY,(h) = 0, and by the version of 


2.4 Moments and characteristic functions 65 


the dominated convergence theorem in Exercise 1.36 it follows that 
lim)_,9 E(Y,,(h)) = 0. This shows that 


pa) = E(X)" +! e"*), 


completing the induction argument. Putting t = 0 proves the theorem. oO 


Exercise 2.27 Use the formula in Theorem 2.42 to obtain an expres- 
sion for the variance of X in terms of the characteristic function ¢y and 
its derivatives, evaluated at zero. 


Exercise 2.28 Suppose that X ~ N(0,c7). Show that for any odd n 
we have E(X”) = 0, and for any even n, 


E(X") =1x3x---x(n-1) xa". 
In the general case, when X ~ N(u, o°), show that 
E(X’) =p’ +o, 
E(X?) = p? + 30", 
E(X*) = pî + 607 + 30%. 


Remark 2.43 

If X is a random variable with values in {1,2,...}, the probability gen- 
erating function Gx(s) = E(s*) = >, sp, allows us (in principle) to 
reconstruct the sequence p, = P(X = n). Setting s = e' turns Gy into the 
moment generating function my(t) = Efe] of X. 

The moment generating function is not always finite-valued, not even 
in an open interval around the origin. However, if X has a finite moment 
generating function my on some interval (—a, a), then we can read off the 
kth moment of X directly as the value of its kth derivative at 0, namely 
E[X*] = m% (0). 


3 


Product measure and independence 


3.1 Product measure 

3.2 Joint distribution 

3.3 Iterated integrals 

3.4 Random vectors in R” 

3.5 Independence 

3.6 Covariance 

3.7 Proofs by means of d-systems 


In a financial market, the price of stocks in one company may well influ- 
ence those of another: if company A suffers a decline in the market share 
for their product, its competitor B may have an opportunity to increase 
their sales, and thus the shares in B may increase in price while those of 
A decline. On the other hand, if the overall market for a particular product 
contracts, we may find that the shares of two rival companies will decline 
simultaneously, though not necessarily at the same rate. 

Modelling the relationships between the prices of different shares is 
therefore of particular interest. We can regard the prices as random vari- 
ables X, Y defined on a common probability space (Q, F , P), and endeav- 
our to describe their joint behaviour. 

In Chapter 2 the distribution of a single random variable X was defined 
as the probability measure on R given by 


Py(B) = P(X € B) 


for all Borel sets B c R. In the case of two random variables X, Y a natural 
extension would be to write the joint distribution as 


Pyy(B) = P(X € Bi, Y € B32) (3.1) 


for any B c R? of the form B = B, x B2, where B,, By C R are Borel sets. 


66 


3.1 Product measure 67 


However, the family of such sets B is not a o-field, and we need to extend 
it further to be able to consider Py y as a probability measure. This will lead 
to the notion of Borel sets in R?. 


Exercise 3.1 Show that the family of sets B c R? of the form B = 
Bı x Bo, where B,, B2 C R are Borel sets, is not a o-field. 


In particular, for a continuous random variable X with density fy : R > 
[0, co) the probability distribution Px can be expressed as 


Px(B) = { Sx(a)dm(x), 
B 


for any Borel set B C R, where m is the Lebesgue measure on the real 
line R. To extend this we need to introduce the notion of joint density fxy : 
R? > [0, co) and define Lebesgue measure m on the plane R?, so that the 
joint probability distribution can be written as 


Px y(B) = | frapan) 
B 


for any Borel set B in R°. 


3.1 Product measure 


When constructing Lebesgue measure m on R, we started by taking m(/) = 
b — a to be the length of any interval J = (a,b), and extended this to a 
measure defined on all Borel sets in R, that is, on the smallest o-field con- 
taining all intervals. The idea behind the construction of Lebesgue mea- 
sure mz on RÊ? is similar. For any rectangle R = (a,b) x (c,d) we take 
m(R) = (b — a)(d — c) to be the surface area, and would like to extend this 
to a measure on the family of all Borel sets in R? defined as the smallest 
o-field containing all rectangles. 


Product of finite measures 


It is not much more effort to consider measures on arbitrary spaces. This 
has the advantage of wider applicability. Consider arbitrary measure spaces 


68 Product measure and independence 


(Q1, Fi, 41) and (Q2, F2, u2) with finite measures u1, 42. We want to con- 
struct a measure u on the Cartesian product Q; x Q, such that 


H(A, X A2) = pi (A1)p2(A2) (3.2) 


for any A; € Fı and Az € F2. The construction has several steps. 
In Q; x Q, a measurable rectangle is a product A; x A for which 
A, € Fi and A; € Fz. We denote the family of all measurable rectangles by 


R = {A, X Ar: Ay E Fi, A2 E Ph}. 


Then we consider the smallest o-field containing the family R of measur- 
able rectangles, which we denote by 


FFIR = (IF : F isao-field on Q; X Qo and RC F} (3.3) 


and call it the product o-field. 


Exercise 3.2 Show that the product o-field F, ® F; is the smallest 
o-field such that the projections 


Pr, :Q); XQ), >Q), Prj (wy, w2) = w1 
Pry: Qi XQ, > Qo, Pro(W), w2) = W2 


are measurable. 


Definition 3.1 
The family of Borel sets on the plane can be defined as 


BR?) = BR) 8 B(R). 


Exercise 3.3 Show that the smallest o-field on R? containing the 
family 


{T, x h : I, h are intervals in R} 


is equal to B(R’). 


Since the domain of a measure is a o-field by definition, the construction 
described in (3.3) is an example of quite a general idea. 


3.1 Product measure 69 


Definition 3.2 
Let Q be a non-empty set. For a family A of subsets of Q we denote the 
smallest o-field on Q that contains A by 


o(A) = (\F : F isao-field on Q, AC F}. 


We call (A) the o-field generated by A. 


Example 3.3 
The Borel sets in R form the o-field generated by the family Z of open 
intervals in R, 


BR) = o(L). 
Example 3.4 
The product o-field is generated by the family R of measurable rectangles, 
Fi @ Fz = o(R). 


The next step in constructing a measure u on Fi ® F> that satisfies (3.2) 
is to define sections of a subset A C Q; X Q2. Namely, for any w € Q, we 
put 


Aw, = {W1 E Q; : (Wi, w2) € A}, 
and, similarly, for any w; € Q; 


Au, = {wo € Q, ‘ (w1, W2) [= A}. 


Exercise 3.4 Let A € Fi ® F2. Show that Au, € Fi and Au, € Fo for 
any wı E€ Q; and w E Q). 


In particular, for a measurable rectangle A = A; X A with A; € Fi 
and A> € fh, we obtain Aw, = A; if w € A and Aw, = Ø otherwise. So 
w My (Ay,) = 14, (w2)u (A1) is a simple function. Hence, from (3.2) 


WAL x As) = (ADA) = S My (Ao duw). 
Q 


70 Product measure and independence 


By symmetry, we can also write 
H(A, X A2) = ui (A1 )2(A2) = f H2(Aw, dpi (w1). 
Qı 


This motivates the general formula defining u on Fı 8 F2. Namely, for any 
A € Fi ® Fy we propose to write 


MA) = f H2(Ag, du (w1) = f H(A duw), 
Qı Q 


along with the conjecture that the last two integrals are well defined and 
equal to one another. We already know from Exercise 3.4 that u (Aa) 
and j(A,,,) make sense since Ao, € Fi and Ao, € Fz. Moreover, for 
the integrals to make sense we need the function w; > pp(A,,,) to be 
F-measurable and w > p;(A,,,) to be Fz-measurable. Our objective is 
therefore to prove the following result. 


Theorem 3.5 
Suppose that u, and pp are finite measures defined on the o-fields F and 
f.n, respectively. Then: 

(i) for any A € Fi 8 Pan the functions 


w  Mr(Ag,), w > WA) 


are measurable, respectively, with respect to Fi and Fz; 
(ii) for any A € Fı ® Fn the following integrals are well defined and 
equal to one another: 


{ Ho(Ag, du (w1) = Í Hi lAo \du2(w2); 
Gii) the function u : Fı ® Fa —> [0, œ) defined by 
H(A) = { b2(Ag, du (w1) = n Hi (Aq, )du2(w2) (3.4) 


for each A € Fi ® fn is a measure on the o-field Fi ® Fr; 
(iv) uis the only measure on Fi ® Fy such that 


H(A, X A2) = u (A1) (A2) 
for each A, € Fı and Az € Fa. 


The proof of this theorem can be found in Section 3.7. It is quite tech- 
nical and can be omitted on first reading. The theorem shows that the fol- 
lowing definition is well posed. 


3.1 Product measure 71 


Definition 3.6 

Let u and u be finite measures defined on the o-fields F; and F, respec- 
tively. We call u defined by (3.4) the product measure on Fi ® F, and 
denote it by 4; ® po. 


Product of o-finite measures 


Observe that Theorem 3.5 and Definition 3.6 do not apply directly to the 
Lebesgue measure m on R because m(R) = co. However, the definition of 
product measure can be extended to a large class of measures with the 
following property, which does include Lebesgue measure. 


Definition 3.7 

Let (Q, F , u) be a measure space. We say that u is a o-finite measure when- 
ever Q = LU, A, for some sequence of events A, € F such that u(A,) < co 
and A, C A, for each n = 1,2,.... 


Example 3.8 
Taking for example A, = [—n,n], we can see that Lebesgue measure m is 
indeed o-finite. 


Definition 3.9 
Let (Q1, Fi, 41) and (Q2, F2, u2) be measure spaces with o-finite measures 
Hı, 2. The product measure u; ® u is constructed as follows. 
(i) Take two sequences of events A, € Fı with uı(A„) < œ and A, C 
Ani, and B, E Fa with u2(B,) < œ and B, C B,,; for n = 1,2,... 


such that 
Q,=| JA =|) B. 


n=l n=1 


ws _ (n) 
(ii) For each n = 1,2,... denote by 4} 


fined by 


the restriction of u, to A, de- 


uPA) =,(A) foreach A € F, such that A C A,, 


and by u” the restriction of u to B,, defined analogously; clearly 
u” and ee are finite measures. 


(iii) Define 4 ® > for any C € F1 8 Fy as 
(Hi @ MXC) = lim,” ® Wy” (C9 (An X By). 


72 Product measure and independence 


Exercise 3.5 Show that the limit in Definition 3.9 (iii) exists and 
does not depend on the choice of the sequences A, Bn in (i). 


Exercise 3.6 Show that yz; ® 42 from Definition 3.9 (iii) is a o-finite 
measure. 


Example 3.10 

In Example 3.8 we observed that the Lebesgue measure m on R is finite. 
The construction in Definition 3.9 therefore applies, and yields the product 
measure m&m defined on the Borel sets B(R?) = B(R)®@ BR) in R*, which 
will be denoted by 


m, = m&m 


and called the Lebesgue measure on R?. 


Exercise 3.7 Verify that m2(R) = (b — a)(d — c) for any rectangle 
R = (a,b) x (c, d) in R?. 


Example 3.11 
We can extend the construction of Lebesgue measure to R” for any n = 
2,3,... by iterating the product of measures. 

Thus, we put, for example 


m, =me®mem 
for the Lebesgue measure defined on the Borel sets 
BR’) = BR) 9 BR) 8 BR) 


in R°. Note that this triple product can be interpreted in two ways: as m ® 
(m&m), a measure on R x R?, or as (m&m) @m, a measure on R? x R. For 
simplicity, we identify both R x R? and R? x R with R? and thus make no 
distinction between (m ® m) & m and m 8 (m & m). 


3.2 Joint distribution 73 


In a similar manner we define the Lebesgue measure 
M, =mMm&m8:::- Qm 
—— mm 
n 


on the Borel sets 


B(R") = B(R) ® BR) @--- @ BR) 
Te 


in R”. 


3.2 Joint distribution 


A random variable on a probability space (Q, F, P) is a function X : Q > 
R such that {X € B} € F for every Borel set B € A(R) on the real line R; 
see Definition 2.16. We extend this to the case of functions with values 
in R?. 


Definition 3.12 
We call Z : Q — R? a random vector if {Z € B} € F for every Borel set 
B € B(R’) on the plane R°. 


Exercise 3.8 Show that X,Y : Q — R are random variables if and 
only if (X, Y) : Q > R? is a random vector. 


We are now ready to define the joint distribution for a pair of random 
variables X, Y : Q — R. Exercise 3.8 ensures that {(X, Y) € B} € F soit 
makes sense to consider the probability P((X, Y) € B) for any B € B(R’). 


Definition 3.13 
The joint distribution of the pair X, Y is the probability measure Pyy on 
(R?, B(R’)) given by 


Px y(B) = P(X, Y) € B) 
for any B € B(R?)). 
In particular, when B = B; x B2 for some Borel sets B1, B2 C R, then 


Px y(B) = P(X, Y) € By X B2) = P(X € Bi, Y € Bp). 


74 Product measure and independence 


The distributions Py and Py of the individual random variables X and Y 
can be reconstructed from the joint distribution Py y. Namely, for any Borel 
set B € B(R) 


Py(B) = P(X € B) = P(X € B,Y € R) = Pyy(BX R), 
Py(B) = P(Y € B) = P(X €R,Y € B) = Pxy(R x B). 
We call Py and Py the marginal distributions of Py y. On the other hand, 


as shown in Exercise 3.9, the marginal distributions Py and Py are by no 
means enough to construct the joint distribution Py,y. 


Exercise 3.9 On the two-element probability space Q = {w1, w2} 
with uniform probability consider two pairs or random variables X;, Y; 
and X>, Y, defined in the table below. 


XxX Yi XX D 


w, 110 60 110 40 
w 90 40 90 60 


Show that Px, = Py,, Py, = Py,, but Px, y, # Pry 


Definition 3.14 
The joint distribution function of X, Y is the function Fyy : R? > [0,1] 
defined by 


Fyy(x,y) = P(X < x,Y < y). 
In other words, 


Fy y(x, y) = Py y((—%, x] x (-09, y]). 


Exercise 3.10 Show that the joint distribution function (x,y) => 
Fy y(x, y) is non-decreasing in each of its arguments and that for any 
a,beR 


lim Fxr(a,y) = Fx(a), 
lim Fyy(x, b) = Fy(b). 


3.3 Iterated integrals 75 


Exercise 3.11 For a,b € R find P(X > a,Y > b) in terms of Fy, Fy 
and Fyy. 


Definition 3.15 
If 


Pyy(B) = | frapan) 
B 


for all B € B(R’), where fxy : R? — R is integrable under the Lebesgue 
measure m, then fy y is called the joint density of X and Y, and the random 
variables X, Y are said to be jointly continuous. 


If X, Y are jointly continuous, the joint distribution and joint density are 
related by 


Fy y(a, b) = Í fxy(x, y)\dm(x, y). 
(—co,a]x(—00,b] 


Example 3.16 
The bivariate normal density is given by 


(GS) 


a — 20% Xz + iy 
fy (1, X2) = : 


1 
2n V1 — p? o| 21- p°) 
where p € (—1, 1) is a fixed parameter, whose meaning will become clear 
in due course. To check that it is a density we need to show that 


f Fxy(%1, X2)dmy(x1, X2) = 1. 
R2 
This will be done in Exercise 3.12. 


We need techniques for calculating such integrals. We achieve this in the 
next section by considering product measures. 


3.3 Iterated integrals 


As in Section 3.1, we consider measure spaces (Q4, F1, u1) and (Q2, Fa, u2). 
In order to integrate functions defined on Q; X Q, with respect to the 


76 Product measure and independence 


product measure we seek to exploit integration with respect to the mea- 
sures 4; and u individually. 

For a function f : Q4 x Q, — [—ce, co] the sections of f are defined by 
wı > f(w1, w2) for any w € Q, and w |> f(W 1, w2) for any wy E€ Q). 
They are functions from Q, and, respectively, from Q, to [—09, oo]. 


Iterated integrals with respect to finite measures 


We first consider the issue of measurability of the sections and their inte- 
grals. 


Proposition 3.17 
Suppose that u, and [ly are finite measures. If a non-negative function f : 
Q; x Q > [0, œ] is measurable with respect to Fi ® fn, then: 
G) the section wı œ f(W 1, W2) is Fi-measurable for each w € Qo, and 
w  f(W1, w2) is F2-measurable for each w, € Qi; 
(ii) the functions 


or | florardn(or ors f floroyda(on 


are, respectively, Fi- measurable and ¥3-measurable. 
Proof First we approximate f by simple functions 


Aste i if flon, o) e (4, $), k=0,1,...,72"—1, 
0 if f(w1, w2) > n, 
which form a non-decreasing sequence such that lim. fn = f. 

(i) The sections of simple functions are also simple functions. It is clear 
that the sections of f, converge to those of f, and since measurability is pre- 
served in the limit (Exercise 1.19), the first claim of the theorem is proved. 

(ii) If A € Fi xX F2, we know from Theorem 3.5 (i) that 


Wy | 14w, w2)dpo(w2) = if 14, (W2)dpo(w2) = f2(Aw,), 
Q2 Qo 


Wy i> a 1,4(@ 1, w)duı (w1) = Í 14, (wı)du (w1) = M (Ao) 
1 1 

are measurable functions. It follows by linearity (Exercise 1.21) that the 

integrals of the sections of f, are measurable functions. By monotone con- 

vergence, see Theorem 1.31, the integrals of the sections of f are limits of 

the integrals of the sections of f,, and are therefore also measurable. This 

completes the proof. Oo 


3.3 Iterated integrals 77 


We are ready to show that the integral over the product space can be 
computed as an iterated integral. 


Theorem 3.18 (Fubini) 
Suppose that u and p are finite measures. If f : Qi X Q2 —> [-09, 9] is 
integrable under the product measure u ® [2, then the sections 


w > f(W1, w2), w, > f(W1, w2) 


are m-a.e. integrable under py and, respectively, u-a.e. integrable un- 
der u, and the functions 


w > a fwi, wdw), 2 > f fwi, w)dyi (w) 
Q Qı 
are integrable under u and, respectively, under u. Moreover, 


fwi, w)d(ui 8 uwi, w2) 
QO) xQ, 


-f For oddel] 
a (Jo 


=, i For oddalo] dlon) 
Q Qı 


Proof We prove the first equality. This will be done in a number steps. 
e If f = 1; is the indicator function for some A € F1 8 P2, then the desired 
equality becomes 


(ui 8 fa )(A) = f lAa du (@1), 
Qı 


and this is satisfied by the definition of the product measure (Defini- 
tion 3.6). 

e If f is a non-negative simple function, then it is a linear combination 
of indicator functions, and linearity of the integral for simple functions 
(Exercise 1.11) verifies the equality in this case. 

e Next, if f is a non-negative measurable function, then it can be expressed 
as the limit of a non-decreasing sequence non-negative simple func- 
tions; see Proposition 1.28. Applying the monotone convergence the- 
orem (Theorem 1.31), first to the inner integral over Q, and then to each 
side of the target equality, verifies the equality. 

e If f is a non-negative integrable function, then, in addition, the integral 
Wie F(@1, w)d(uı ® Lo (W1, w2) on the left-hand side of the equality is 


finite, and therefore so is the integral i ( Io flw, wd (w2)) dui(wı) 


78 Product measure and independence 


on the right-hand side. This means that w; => h F(@4, w)du2(w2) is 
integrable under u. This in turn means that the section w = f(w1, w2) 
is uı-a.e. integrable under ju. 

Finally, for any function f integrable under uı ® u2, we take the decom- 
position f = f*— f~ into the positive and negative parts; see (1.9). Since 
f* and f~ are non-negative integrable functions, they satisfy the equality 
in question, with the integrals on both sides of the equality being finite. 
This, in turn, gives the identity for f, along with the conclusion that 
w |e N f(@1, w2)du(%w2) is integrable under u; and w > f(w1, w2) 
is uı-a.e. integrable under ju. 

The proof of the second identity and the integrability of the functions w œ> 
is flw, wodu (w) and w; > flw, w2) is similar. o 


Iterated integrals with respect to o-finite measures 


Before we can handle iterated integrals with respect to Lebesgue measure, 
we need to extend Fubini’s theorem to o-finite measures. 

Suppose that u, {2 are o-finite measures, and let f : Q1 XQ, —> [-0v, co] 
be an integrable function under the product measure u ® u2. We proceed 
as follows. 

e Take sequences of events A, and B, as in part (i) and rie ue to be the 
finite measures from part (ii) of Definition 3.9. Then 


f Flw, w) du ® pS” (wi, w2) 
AnXBn 


= f | f Fon en) duo) ue) 
B, An 

=| f fesse) e) daen) 
An By 


The integrals in these identities can be written as 
[ 14,x8, (w1, w2) f (w1, w2) dh ® u2)(w1, w2) 
Q)xQ2 


- | n| f 140r o) dulo) duae) 
Q 


1 


al of 1, ofon 0 diato) dao) 
Qı Qo 


e Next, if f is non-negative, we can use monotone convergence, that is, 


3.3 Iterated integrals 79 


Theorem 1.31, to obtain 


f(W1, w) dy 8 f2)(@1, w2) 
Q | xQs 


-f| For o) dod] dalo 
a \Ja, 


=f] For o» delod] dyer) 
a, (Jo 


in the limit as n —> co because A,, B, and A, X B, for n = 1,2,... are 
non-decreasing sequences of sets such that J% An = Qu, UX; Bn = @ 
and Ur (An X Bn) = Qi X Qo. 

e Finally, for any function f : R? > R integrable under p, ® u2, we know 
that the latter identities hold for the positive and negative parts f*, f~. 
Since f = f* — f~, we obtain the same identities for f by the linearity of 
integrals. Moreover, since i Ff(W 1, W2) dui ® f2)(W1, W2) < œ, we 
can conclude that the integrals on the right-hand side are finite, and so 
the functions 


w of fwi, w) duw) and w> (f f(@1, w2) duo») 
Qı Q 


are integrable. 
This extends Fubini’s theorem to o-finite measures and, in particular, to 
iterated integrals with respect to Lebesgue measure. 


Example 3.19 
We now apply these results to the joint probability Pyy of two random 
variables X, Y. If X, Y are jointly continuous with density fyy, then 


Py y(B, X B2) = fxy(%, y)dim(x, y) 


B,xBo 


= |i f fy y)dm(y)) dnt 
By \JB 


for any Bı, B, € (R). In particular, when B, = {X < a} and B, = {Y < 
b} for some a,b € R and the joint density is a continuous function, this 


becomes 
a b 
Fxy(a, b) = 1 (f xy, yy) dx, 


80 Product measure and independence 


with Riemann integrals on the right-hand side, so we obtain 
02 

JD) = wD). 

fxy(a, b) Abda xy(a, b) 


Proposition 3.20 
If X,Y are jointly continuous random variables with density fyy, then X 
and Y are (individually) continuous with densities 


fzx) = f fxy(y)dmy), frO)= f Jxx(x, y)dm(x). (3.6) 
R R 


Proof By Fubini’s theorem, 
Px(B) = Pxy(B x R) 


= i Sx, y)dm(x, y) = f | f fir © AmO) dn 
BxR B \JR 


for any B € B(R), where x = h Ixy(x, y)dm(y) is a non-negative inte- 
grable function. This means that 


fx(x) = | faves y)dm(y). 
R 
The proof of the second identity is similar. Oo 


We call fy and fy given by (3.6) the marginal densities of the joint 
distribution of X, Y. 


Exercise 3.12 
Confirm that fy y given by (3.5) is a density, that is, 


{ Sxy(%1, X2)dm (x1, X⁄2) = 1 
R2 


and that fx, fy given by (3.6) are standard normal densities. 


Exercise 3.13 Suppose X, Y have joint density fx, y. Show that X + Y 
is a continuous random variable with density 


Jx) = S frz — x)dm(x). 
R 


3.4 Random vectors in R" 81 


Exercise 3.14 Suppose that random variables X, Y have joint density 
fxy(x, y) = e) when x > 0 and y > 0, and fxy(x, y) = 0 otherwise. 
Find the density of X/Y. 


3.4 Random vectors in R” 


When defining the joint distribution of two random variables, we found it 
helpful to consider random vectors (Definition 3.12) as functions from Q 
to R?. We can extend this to n random variables. As for R?, we can define 
the Borel sets in R” by means of products of Borel sets in R, generalising 
the notation introduced in (3.3). 


Definition 3.21 
Given n = 2,3,..., define the o-field of Borel sets on R” as 


B(R") = BCR) ®@ BCR) @--- ® BR), 
a 


as in Example 3.11. In other words, B(R") = o(R,), the o-field on R” 
generated by the collection 


Rn ={B,x---x Bn : Bi,..., Ba E B(R)}. 


Exercise 3.15 Show that B(R"”) = o(1,), where 


Ias {Xx ha: i, ..., {n are intervals in R}. 


Definition 3.22 
A map X = (X1, X2, . . . , Xn) from (Q, F , P) to R” is called a random vector 
if {X € B} € F for every B € B(R"). 


Exercise 3.16 Show that X = (X1, X2,..., Xn) is a random vector if 
and only if X1, X2, . . . , X„ are random variables. 


Definition 3.23 
Let X = (X1, X2,..., Xn) be a random vector from Q to R”. The joint 


82 Product measure and independence 


distribution of X (equivalently, of X,...,X,,) is the probability Py on 
(R”, B(R")) defined by 


Py(B) = P(X € B) foreach B € BCR"). 
The joint distribution function of X = (X1, X2,...,Xn) is the function 
Fy : R” > [0, 1] given by 
Fy(x1,.--5Xn) = P(X S x1, Xo S X,...,Xn < Xn) 
for any x1,..., Xn E R. 


Having made sense of Lebesgue measure m, on R”, we can use it, just 
as we did for m, to define the joint density of n random variables. 


Definition 3.24 
We say that a random vector X = (X1, X2,...,X,) has joint density if the 
joint distribution Py can be written as 


Px(B) = f fx(x)dm,(x) for each B € B(R”) 
B 


for some integrable function fy : R” —> [0, co], where m, denotes Lebesgue 
measure on R”. In this case the random variables X1, X>,...,X, are said 
to be jointly continuous. The Py, are called the marginal distributions 
of P xX: 


If Py has a density fy on R”, then the marginal distribution Py, has den- 
sity on R given by an integral relative to Lebesgue measure m,_, on R"!. 
Indeed, using Fubini’s theorem for o-finite measures repeatedly, we have 


Py(B) = P(X €R™ x BXR™) = f Sx(y) dm Q) 


R-!yBxR™ 


= f | f fal %.2") dma) dm,» 
Ri-!xB \J R" 

a (f (l KOX dm |m 0) dm 
B Re Ri! 

j f (f KOE ds.) dm 
B\JRe! 


for any B € B(R), where for any x’ € R'!, x € Rand x” € R" we identify 
(x’, x, x”) with a point in R”, (x’, x) with a point in R’, and (x’, x”) with a 
point in R™!. It follows that 


f(x) = { fx’, x, x”) dimy_1 (2, x”). 
Rel 


3.5 Independence 83 


This extends the result in Proposition 3.20. 


Example 3.25 
We call X = (X1, X2,...,X,) a Gaussian random vector if it has joint 
density given for all x € R” by 
RO) : EMER, | (3.7) 
)= Oyo ||=—(Ge= = ; ; 
OS Norman i a £ 


where u € R”, È is a non-singular positive definite (that is, x7Xx > 0 
when x € R” is any non-zero vector) symmetric n X n matrix, £~! is the 
inverse matrix of Ł, det È denotes the determinant of È, and (x — u)” is the 
transpose of the vector x — u in R”. We say that (3.7) is a multivariate 
normal density. 

In particular, the bivariate normal density (3.5) from Example 3.16 fits 
into this pattern since it can be written as 


2 2 
A = Zoea i 2) 


1 1 l gaat ) 
exp|— S| =R 2, a]l 
In Jl p ( 2(1 - p’) (2m)? det X | 2 


1 : ae : 
where x = | h | and where X = | a A is a positive definite sym- 
2 
metric matrix with determinant det£ = 1 — p? > 0 and inverse £7"! = 
1 l = 
1-p? —p 1 


Exercise 3.17 Show that (3.7) is indeed a density, that is, 


l l me ) 
x = n = 1. 
J. EE? | gp ee ee) 


3.5 Independence 


One of the key concepts in probability theory is that of independence. We 
consider it in various forms: for random variables, events and o-fields. 
Each time we start with just two such objects before moving to the gen- 
eral case. 


84 Product measure and independence 


Two independent random variables 
We begin by examining two random variables whose joint distribution is 


the product of their individual distributions. 


Definition 3.26 
If random variables X, Y satisfy 


Px y(B, X B2) = Px(Bı)Py(B2) (3.8) 
for all choices of B1, B2 € BCR), we say that X and Y are independent. 


In other words, the joint distribution of two independent variables X, Y 
is the product measure Pyy = Py ® Py. We can conveniently express this 
in terms of distribution functions. 


Theorem 3.27 

Random variables X, Y are independent if and only if their joint distribution 
function Fyy is the product of their individual distribution functions, that 
iS, 


Fyy(x, y) = Fx(x)Fy(y) for any x,y € R. 


The necessity of this condition is immediate, but the proof of its suffi- 
ciency is somewhat technical and is given in Section 3.7. 

When X and Y are jointly continuous, their independence can be ex- 
pressed in terms of densities. 


Proposition 3.28 
If X, Y are jointly continuous with density fy y, then they are independent if 
and only if 


fxvy) = f) fro), m-a.e., (3.9) 
that is, if and only if 
m (fa, y) ER’: fxy(x,y) # fx) fr)}) =f), 


Proof Proposition 3.20 confirms that X and Y have densities fy, fy. For 
any Bı, B2 E€ B(R) we have 


Py y(Bı x B2) = f Jxxy(x, y)dm(x, y), 
BıXxB2 


3.5 Independence 85 


while 


Px(X € Bi)Py(Y € Bo) = (f Kodm) f flood) 
Bı B2 


= f ([ RONOM) amw 
B; By 


= fx fro) dm(x, y) 


BıxB2 


by Fubini’s theorem. Now if (3.9) holds, it follows immediately that (3.8) 
does too. Conversely, if (3.8) holds, then we see that 


fxy(x, y)dm(x, y) = fx) fry )dm(x, y) 


BıxBı BıxB2 


for any Borel sets Bı, B2 € B(R). It follows from Lemma 3.58 in Sec- 
tion 3.7 that 


| frapan = | Kokoda) 
B B 


for any Borel set B € B(R?) because, by Theorem 1.35, the integrals on 
both sides of the last equality are measures when regarded as functions of 
B € B(R’). This implies (3.9) by virtue of Exercise 1.30. o 


As a by-product we obtain the following result. 


Corollary 3.29 

If X and Y are (individually) continuous and independent, then they are 
also jointly continuous, with joint density given by the product of their 
individual densities. 


This result fails when the random variables are not independent, as the 
next exercise shows. 


Exercise 3.18 Give an example of continuous random variables X, Y 
defined on the same probability space that are not jointly continuous. 


Exercise 3.19 Suppose the joint density fx,y of random variables 
X,Y is the bivariate normal density (3.5) with p = 0. (We call fxy 
the standard bivariate normal density when p = 0.) Show that X and 
Y are independent. 


86 Product measure and independence 


Exercise 3.20 Show that if X and Y are jointly continuous and inde- 
pendent, then their sum X + Y has density 


oe f RORE- DUm: 


This density is called the convolution of fx and fy. 


Families of independent random variables 


The following definition is a natural extension of the concept of indepen- 
dence from 2 to n random variables. 


Definition 3.30 

Let Py be the joint distribution of a random vector X = (X1, Xo,...,Xn). 
The random variables X1, X2, . . . , X„ are said to be independent if for every 
choice of Borel sets B1, Bo,...,B, € B(R) 


Px(By X By X +++ By) = | | Px (Bo 


i=l 
or in other words if 


Px = Px, 8 Px, 8 --- 8 Px 


n° 


An arbitrary family X of random variables are called independent if for 
every finite subset {X;, X2, . . . , Xn} C X the random variables X1, X2, . . . , Xn 
are independent. 


For n random variables their independence can again be expressed in 
terms of the distribution function. The proof follows just like in the previ- 
ous section, which dealt with the case n = 2. 


Theorem 3.31 
Xı, X2, . . . , X, are independent if and only if the joint distribution function 
for the random vector X = (X1, X2, . . . , Xn) satisfies 


n 
Fyts: sagin) = | | FxGa for any X1,X2,---5Xn = R. 
i=1 


3.5 Independence 87 


The description in terms of densities follows just as for the case n = 2. 


Theorem 3.32 
If a random vector X = (X,,...,X,) has joint density fx, then X,,...,Xn 
are independent if and only if 


f(t, X2.. Xn) = [| fe (Xi) m,-a.e. (3.10) 
il 


Exercise 3.21 Prove Theorems 3.31 and 3.32. 


Exercise 3.22 Suppose that the random vector X = (X1, X2,..., Xn) 
has joint density (3.7). Show that E(X;) = yu; for each i = 1,...,n. 
Also prove that if È is a diagonal matrix, then X,, X>,..., X, are inde- 
pendent. 


Two independent events 


Our principal interest is in random variables, but the concept of indepen- 
dence can be defined more widely: for A; = {X € Bı} and A; = {Y € B2} 
we see that (3.8) becomes 


P(A, N Az) = PAPA). 
We turn this into a general definition for arbitrary events A1, Az E F. 


Definition 3.33 
Let (Q, F, P) be a probability space. Events A1,A2 € F are said to be 
independent if 


P(A, N Az) = P(A P2). 


Example 3.34 

A fair die is thrown twice. Thus OSC : i,7 = 1,2,3,4,5,6} and 
each pair occurs with probability x. Let A be the event that the first throw 
is odd, and B the event that the second throw is odd. Then P(A) = 5 = 
P(B), while P(A N B) = L, Thus A, B are independent events. However, 
for D = {(i, j) € Q: i+ j > 6} the events A, D are not independent since 
P(D) = 5 and AN D)=} 41x5. 


88 Product measure and independence 


Exercise 3.23 Show that A;, A2 are independent events if and only if 
Ay, Q \ Az are independent events. 


Exercise 3.24 Show that Aj, A, are independent events if and only if 
the indicator functions 1,,, 14, are independent random variables. 


Now suppose we know that an event A has occurred. This means that A 
can take the place of ©. For any event B only the part of B lying within A 
matters now, so we replace B by A N B in order to compute the probability 
of B given that A has occurred. We normalise by dividing by P(A) to define 
the conditional probability of B given A as 


P(A N B) 


(3.11) 


This makes sense whenever P(A) + 0. It is then natural to consider the 
events A, B as independent if the prior occurrence of A does not influence 
the probability of B, that is, if 


P(B|A) = P(B). 


(It is equivalent to P(A|B) = P(A) when P(B) # 0 in addition to P(A) # 0.) 
In Example 3.34 this is simply the statement that the outcomes of the first 
throw of the die do not affect the outcome of the second. It is consistent 
with Definition 3.33 of independent events, 


P(A N B) = P(A)P(B), 


with the apparent advantage that the latter also applies when P(B) = 0 or 
P(A)= 0. 


Families of independent events 


Extending the definition of independence to more than two events requires 
some care. It is tempting to propose that A, B, C be called independent if 


P(A A BOC) = P(A)P(B)P(C), (3.12) 


but the following exercises show that this would not be satisfactory. 


3.5 Independence 89 


Exercise 3.25 Find subsets A, B, C of [0, 1] with uniform probability 
such that (3.12) holds, but A, B are not independent. Can you find three 
subsets such that each pair is independent, but (3.12) fails? 


Exercise 3.26 Find another example by considering the events A, B 
in Example 3.34 together with a third event C chosen so that each pair 
of these events is independent, but (3.12) fails. 


This leads us to make the following general definition. 


Definition 3.35 
A finite family of events A;,...,A, € F are said to be independent if 


k 
P(A;, NA, N NA) = [| P(A;,) 
j=l 
for any k = 2,...,n and for any 1 < i <: < i <n. 
An arbitrary family of events is defined to be independent if each of its 
finite subfamilies is independent. 


It is immediate from this definition that a subcollection of a family of 
independent events is also independent. 


Exercise 3.27 Suppose A,B,C are independent events. Show that 
AU Band C are independent. 


Exercise 3.28 Show that A), A2,...,A, are independent events if and 
only if the indicator functions 1,,,1,,,..., 1,4, are independent random 
variables. 


Example 3.36 
Let (Q;, Fi, Pi) be a probability space for each i = 1,2,...,n, and suppose 
that Q = Q) x Q2 X- - -X Q, is equipped with the product o-field F = F ® 


90 Product measure and independence 


--- Q F, and the product probability P = P; ®--- ® P,,. Cylinder sets are 
defined in as Cartesian products of the form 


Ci = Q1 X- X Q1 XA; X Qi XX On 


for some i = 1,2,...,n and A; € F;. By the definition the product measure, 
P(C;) = P;(A;). Moreover, for any i # j 


BCN) 
= INOW R K Oa Ky K Dian Ko X On X Ale K Dany K 000 ©), 
= P(Aj)P (Aj) 
= P(C))P(C)), 


so the cylinder sets C;, C; are independent. By extending this argument, we 
can show that C;,C2,...,C, are independent. 


Two independent o-fields 


The defining identity (3.8) for independent random variables X, Y can be 
written as 


P(A, N A2) = P(A1)P(A2) 


for any events of the form A; = {X € Bı} and A; = {Y € B2} with Bı, B2 € 
H(R). In other words, X, Y are independent if and only if for any A; € o(X) 
and A» € o(Y) the events Aj, A> are independent. 

As we can see, independence of random variables X, Y can be expressed 
in terms of the generated o-fields 0(X), 7(Y). This suggests that the notion 
of independence can be extended to arbitrary o-fields G1, G2 C F. 


Definition 3.37 
We say that o-fields G1, G2 C F are independent if for any A; € G and 
Az € G» the events A), A2 are independent. 


We can now say that random variables X, Y are independent if and only if 
the o-fields o(X), o(Y) are independent. As a simple application, we show 
the following proposition. 


Proposition 3.38 

Suppose that X, Y are independent random variables and U, W are random 
variables measurable with respect to o(X) and, respectively, o(Y). Then 
U, W are independent. 


3.5 Independence 91 


Proof Since U is measurable with respect to o(X) and W is measurable 
with respect to 0(Y), we have o(U) C o(X) and o(W) c o(Y). Indepen- 
dence of X, Y means that the o-fields o(X), o(Y) are independent, which 
implies immediately that the sub-c-fields o(U), o(W) are independent, and 
so the random variables U, W themselves are independent. Oo 


Corollary 3.39 

If X,Y are independent random variables and g,h : R — R are Borel- 
measurable functions, then g(X),h(Y) are also independent random vari- 
ables. 


Exercise 3.29 Show that A, B are independent events if and only if 
the o-fields {@, A, Q \ A, Q} and {@, B, Q \ B, Q} are independent. 


Exercise 3.30 What can you say about a o-field that is independent 
of itself? 


Remark 3.40 
Extending Definition 3.26, we can say that two random vectors 


X = (X,...,Xm), Y =(%,...,¥n) 
with values, respectively, in R” and R” are independent whenever 
Py y(B X B2) = Px(B))Py(B1) 


for all Borel sets B; € BCR”) and B2 € BUR”), where by Pyy we denote the 
joint distribution of the random vector (X1,..., Xm, Yis -< <, Yn). 

Equivalently, we can say that the random vectors X, Y are independent 
whenever the o-fields o(X), o(Y) generated by them are independent, where 
by definition o(X) consists of all events of the form {X € B} with B € 
B(R") and, similarly, o(X) consists of all events of the form {Y € B} with 
Be BR"). 


Families of independent o-fields 


The notion of independence is readily extended to any finite number of 
o-fields. 


92 Product measure and independence 


Definition 3.41 
We say that o-fields G1, G2, ..., Gn C F are independent if for any A, € 
Gi, A2 E Go,...,An E Gn the events A;,A2,...,A,, are independent. 


Exercise 3.31 Show that random variables X,,X2,...,X, are inde- 
pendent if and only if their generated o-fields 0(X), 7(X2),..., (Xn) 
are independent. 


Exercise 3.32 Show that events A;,A2,...,A, E F are independent 
if and only if the o-fields G1, G2,...,G, are independent, where G; = 
{@, Ap, Q \ Ax, Q} for k = 1,2,...,n. 


Given a random variable Y on a probability space (Q, F , P) and a o-field 
G C F itis now natural to say that Y is independent of G if o(Y) and G are 
independent o-fields. 


Exercise 3.33 Suppose that X), X>,...,X,, Y are independent ran- 
dom variables. Show that Y is independent of the o-field o(X) gener- 
ated by the random vector X = (X1, X2,...,X;,,). By definition, the o- 
field o(X) consists of all sets of the form {X € B} such that B € B((R)). 


Independence: expectation and variance 


Theorem 3.42 
If X,Y are independent integrable random variables, then the product XY 
is also integrable and 


E(XY) = E(X)E(Y). 


Proof First suppose that X = ii"; aila, and Y = X=; bjlp, are simple 
functions. Then XY = Yi") Di") aibjLaing,- We may assume without loss 
of generality that the a; are distinct, so A; = {X = a;} for each i = 1,...,m, 


and that the b; are also distinct, so B; = {Y = b;} for each j = 1,...,n. If 


3.5 Independence 93 
X, Y are independent, then so are A;, B; for each i, j, and 


E(XY) = ` > aib;P(A; N B;) = 5 ` ajb;P(A;) P(B)) 


i=l j=l i=l j=l 


= » arao]| b ra) = E(X)E(Y). 


i=l j=l 


Now define 
k-1 k-1 k 7 s 
nasl? for F Sx< z k=1,2,...,n2 
n for x>n 
for each n = 1,2,... . If X is a non-negative random variable, then X, = 


fa(X) is a non-decreasing sequence and lim,_,.. Xn = X. If X is not neces- 
sarily non-negative, we put 
Xn = fn(X*) — fa X). 


Then X, is a sequence of simple functions such that lim,...X, = X, 
and |X,| is a non-decreasing sequence of simple functions such that 
limy-500 [Xn| = IXI, so lim, 0 E(|X,|) = E(X|) by monotone convergence, 
see Theorem 1.31. Similarly, we put 


Yn = fX) — fa), 


which have similar properties as the X,,. It follows that |X„Y,| = |X„| |Yņl is 
a non-decreasing sequence of simple functions such that lim,_,.. |[X,Y,| = 
|X Y|. Using monotone convergence once again, we have lim,_,.. E(\X;,Yn|) = 
E(IXY)). If X, Y are independent, then by Corollary 3.39, so are X,„, Y„ and 
also |X,,|,|Y,|, so the result already established for independent simple ran- 
dom variables yields 


E(X, Yn) = EXE) and E(X, Ya) = EdXnlEUYnl), 
for any n. If, in addition, X and Y are both integrable, then 
E(XY|) = lim E(X, Y.l) = lim E(X, DEY, ) = EIXDE(IYI) < œ, 


which means that |XY| is integrable. Finally, because [X,,Y,,| < |XY| and 
liM, XnYn = XY, by dominated convergence, see Theorem 1.43, we can 
conclude that XY is integrable and 


E(XY) = lim E(X, Y,) = lim E(X )E(Yn) = EQOEY), 


completing the proof. Oo 


94 Product measure and independence 


Exercise 3.34 Prove the following version of Theorem 3.42 ex- 
tended to the case of n random variables. 

If X\,X2,...,Xn are independent integrable random variables, then 
the product [];_, Xi is also integrable and 


(FI x) = ği E(X;). 
i=l i=1 


Example 3.43 
The converse of Theorem 3.42 is false. For a simple counterexample take 
Ni) = and) — a On [-4, 1] with Lebesgue measure. Since X and XY 
are both odd functions, their integrals over [— L, 1] are 0, so that E[XY] = 
E[X]E[Y]. However, X, Y are not independent, as we can verify by taking 
thei inverse images of B = [-3, 3] under X and Y. We see that {X € B} = 
-i, 5] and {Y € B} = a +], ad these are not independent na war 
intersection has measure 5 2 whereas the product of their measures is 2 ixi == 


A 
a 


The following important result illustrates the extent to which knowledge 
of expectations provides a sufficient condition for independence. 


Theorem 3.44 
Random variables X, Y are independent if and only if 


EIL) = ELFCOIELS()] (3.13) 
for all choices of bounded Borel measurable functions f,g:R—- R. 


Proof Suppose (3.13) holds, and apply it with the indicators 1g, 1c for 
Borel sets B,C € B(R). Then (3.13) becomes simply 


P(X € B,Y € C) = P(X € B)P(Y EC). 


This holds for arbitrary sets B,C € B(R), so X, Y are independent. 
Conversely, if X and Y are independent and f, g are real Borel functions, 
then Corollary 3.39 tells us that f(X) and g(Y) are independent. If f, g are 
bounded, then f(X) and g(Y) are integrable, so by Theorem 3.42 we have 
(3.13). o 


3.5 Independence 95 


Recalling that for a complex-valued f = Ref + ilmf we define E(f) = 
E(Ref) + iE(Imf), we can see that (3.13) extends to bounded complex- 
valued Borel measurable functions. We therefore immediately have the fol- 
lowing way of finding the characteristic function of the sum of independent 
random variables. 


Corollary 3.45 
If X,Y are independent random variables, then 


dbxsy(t) = bx(dby(2). 


Exercise 3.35 Recall from Exercise 2.25 the characteristic function 
of a standard normal random variable. Use this to find the characteris- 
tic function of the linear combination aX + bY of independent standard 
normal random variables X, Y. 


Proposition 3.46 
If X,Y are independent integrable random variables, then 


Var(X + Y) = Var(X) + Var(Y). 


Proof The random variables V = X — E(X) and W = Y — E(Y) are in- 
tegrable because X and Y are. Moreover, since X,Y are independent, so 
are V, W by Corollary 3.39. It follows that VW is integrable and E(VW) = 
E(V)E(W) = 0. Applying expectation to both sides of the equality 


(V+WyY =V°+2VW+W’, 
we get 


Var(X + Y) = E((V + WY) = E(V’) + 2E(VW) + E(W’) 
= E(V”) + E(W’) = Var(X) + Var(Y). 


Exercise 3.36 Prove the following version of Proposition 3.46 ex- 
tended to the case of n random variables. 
If X,,...,Xy are independent integrable random variables, then 


Var(X, + X2 +--+ + Xn) = Var(X1) + Var(X2) +--+ + Var(X,). 


96 Product measure and independence 


3.6 Covariance 


Covariance and correlation can serve as tools to quantify the dependence 
between random variables, which in general may or may not be indepen- 
dent. 


Definition 3.47 
For integrable random variables X, Y whose product XY is also integrable 
we define the covariance as 


Cov(X, Y) = E (X - E(X) Y — E(Y))) = E (XY) - EXE). 


If, in addition, Var(X) + 0 and Var(Y) + 0, we can define the correlation 
coefficient of X, Y as 
_ Cov(X, Y) 


OxOy 


PXY 


It is not hard to verify the following properties of covariance, which are 
due to the linearity of expectation: 


Cov(aX, Y) = aCov(X, Y), 
Cov(W + X, Y) = Cov(W, Y) + Cov(X, Y). 


It is also worth noting that, in general, 
Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y). 
Observe further that 
Cov(X, Y) = Cov(Y, X) 
and 


Cov(X, X) = Var(X). 


Exercise 3.37 Suppose X, Y have bivariate normal distribution with 
density (3.5). Compute Cov(X, Y). 


It follows from Definition 3.47 that independent random variables X, Y 
have zero covariance and zero correlation (when it exists). More generally, 
we say that X and Y are uncorrelated if oxy = 0. Example 3.43 shows 
that uncorrelated random variables need not be independent. However, for 
normally distributed random variables the two concepts coincide. 


3.6 Covariance 97 


Exercise 3.38 Show that if X,Y have joint distribution with den- 
sity (3.5) for some constant p € (—1, 1), then their correlation is given 
by pxy = p. Hence show that if such X, Y are uncorrelated, then they 
are independent. 


Next, we prove an important general inequality which gives a bound 
for Cov(X, Y) in terms of Var(X) and Var(Y) or, equivalently, a bound for 
pxy. Anticipating the terminology to be used extensively later, we make 
the following definition. 


Definition 3.48 
A random variable X is said to be square-integrable if X? is integrable. 


Lemma 3.49 (Schwarz inequality) 
If X and Y are square-integrable random variables, then XY is integrable, 
and 


[E(XY)]? < E(X*)E(Y’). 


Proof Observe that for any t € R we have obtain 0 < (X + tY}? = X + 
2tXY + Y°, and these random variables are integrable since X and Y are 
square-integrable. As a result, 


0 < E(X + tY)’) = E(X?) + 2tE(XY) + PEY’) 


for any t € R. For this quadratic expression in ź¢ to be non-negative for all 
t € R, its discriminant must be non-positive, that is, 


[2E(XY)]? — 4E(X”)E(Y’) < 0, 
which proves the Schwarz inequality. Oo 


Applying Lemma 3.49 to the centred random variables X —E(X) and Y— 
E(Y) it is now easy to verify the following bounds for Cov(X, Y) and py.y. 


Corollary 3.50 
The following inequalities hold: 


[Cov(X, Y)]? < Var(X)Var(Y), 
—1 < PXY < ili: 


Exercise 3.39 Suppose that |oyxy| = 1. What is the relationship be- 
tween X and Y? 


98 Product measure and independence 


Finally, we can quantify the dependencies between n random variables 
by means of the covariance matrix defined as follows. 


Definition 3.51 

For a random vector X = (X1, X2,..., Xn) consisting of integrable random 
variables such that the product X;X; is integrable for each i, j = 1,2,...,n 
we define the covariance matrix to be the n X n square matrix with entries 
Cov(X;, X;) for i, j = 1,2,...,n, that is, the matrix 


Cov(X), Xı) Cov(Xi, X2) +++ Cov(X), Xn) 
Cov(X2, X1) Cov(X2, X2) +++ Cov(Xs, Xn) 
Cov(Xn, Xı) Cov(X,, X2) aor Cov(Xn, Xn) 


Since Cov(X;, X;) = Cov(X;, X;), the covariance matrix is symmetric. 
The diagonal elements are Cov(X;, X;) = Var(X;). Also note that for any 
vector a = (a), 42, .. . , An) € R” we have 

0 < Var(a,;X +--+ + a,Xn) 
= Cov(a,X, +--+ @,Xn, aX, ++ + ap Xn) 


= a' Ca, 


which means that the covariance matrix is non-negative definite. 


3.7 Proofs by means of d-systems 


The idea behind the proof of Theorem 3.5 is to observe that the desired 
properties hold for all sets in the o-field generated by R, the class of mea- 
surable rectangles. For example, setting 


D = {A € a(R): wı  po(Ag,) is Fi-measurable}, 


we wish to show that D = a(R) in order to prove that w > fo(Ag,) is 
F-measurable for every A € a(R) (and similarly for the function w => 
Mi (Aj,)). It is clear that R c D since for Aj x A2 E€ R we have pio(Aw,) = 
14,(w1ı)u2(A2). It would therefore suffice to prove that D is a o-field in 
order to verify part (i) of the theorem 

Similarly, to ensure that u is uniquely determined on Fi ® F2 by its 
defining property (3.2) on R, it would suffice to prove that, given two finite 
measures v1, v2 that agree on R, the collection of all sets on which they 
agree is a o-field. 


3.7 Proofs by means of d-systems 99 


Here we have two examples of a general proof technique which finds 
frequent use in measure theory. First we observe that a certain property 
holds on a given collection C of subsets, and show that the collection D of 
all sets that satisfy this property is a o-field. Since the o-field D contains C, 
it must contain o (C), so the property holds on o(C). Rather than verifying 
directly that D satisfies the definition of a o-field, it is often easier to check 
that D meets the following requirements. 


Definition 3.52 
A system D of subsets of Q is called a d-system on Q when the following 
conditions are satisfied: 
0 QED; 
di) if A,B € D with A C B, then B \ A € D; 
(iii) if A; C Anı and A; € D for i = 1,2,..., then U2, A; € D. 


Similarly as for o-fields, the smallest d-system on Q that contains a 
family C of subsets of Q is given by 


d(C) =( {D : D is a d-system on 9, C c D} 


and called the d-system generated by C. 

It is clear that the conditions defining a d-system are weaker than those 
for a o-field; compare Definitions 1.10 (ii) and 3.52. 

Since every o-field is a d-system, for any collection C we have d(C) c 
o(C). If, for a particular collection C, we can prove the opposite inclu- 
sion, our task of checking that the desired property holds on o(C) will be 
accomplished by checking the conditions defining a d-system. 

Of course, we cannot expect this to be true for an arbitrary collection C. 
However, the following simple property of C is sufficient. 


Definition 3.53 
A family C is closed under intersection if A, B € C implies AN B € C. 


An immediate example of such a collection is given by the measurable 
rectangles. 


Exercise 3.40 Show that the family R of measurable rectangles is 
closed under intersection. 


When a family C of subsets of a given set Q is closed under intersection, 
the d-system and the o-field generated by C turn out to be the same. We 
prove this below. 


100 Product measure and independence 


Lemma 3.54 
Suppose a family C of subsets of Q is closed under intersection. Then d(C) 
is closed under intersection. 


Proof Consider the family of sets 
G = {A € d(C) : AN C € d(C) forall C € C}. 


Since C is closed under intersection and C C d(C), we have C c G. We 
claim that G is a d-system. Obviously, Q N C = C e C for each C € C, so 
Q € G. If A,B € G and A C B, then for any C € C 


(B\ANC=(BAO\ANC) EdC) 


since ANC,BAC €d(C)and ANC c BNC. Thus B\AEG. 
Finally, suppose that A; C A;ı and A; € d(C) for i = 1,2,... . Then for 
any C €C we have A; N C C Anı N C and A; N C € d(C), so 


Ù a) NC= (Ja; MC)-€ d(C), 


i=1 i=1 


implying that (JŽ; Ai € G. We have shown that G is a d-system such that 
C C G C d(C), hence G = d(C). 


Now consider the family of sets 
H = {A € d(C) : AN B € d(C) for all B € d(C)}. 


Because G = d(C), we know that C c H. Moreover, H is a d-system, 
which can be verified in a very similar way as for G. Since H c d(C), we 
conclude that H = d(C), and this proves the lemma. o 


Lemma 3.55 
A family D of subsets of Q is a o-field if and only if it is a d-system closed 
under intersection. 


Proof If Dis ao-field, it obviously is a d-system closed under intersec- 
tion. Conversely, if D is a d-system closed under intersection and A, B € D, 
then QO, Q \ A,Q\ Be D. Hence Q \ (A U B) = (Q9 \ AVN \ B) is in D. 
But then so is its complement A U B since Q € D. Finally, if A; € D for all 
i=1,2,..., then the sets B; = (ie A; belong to D by induction on what 
has just been proved for k = 2. The B, increase to 2, A;, so this union 
also belongs to the d-system D. We have verified that Disao-field. O 


Together, these two lemmas imply the result we seek. 


3.7 Proofs by means of d-systems 101 


Proposition 3.56 
Ifa family C of subsets of Q is closed under intersection, then d(C) = o (C). 


Proof Since C is closed under intersection, it follows by Lemma 3.54 that 
d(C) is also closed under intersection. According to Lemma 3.55, d(C) is 
therefore a o-field. Because C C d(C) C o(C) and o(C) is the smallest 
o-field containing C we can therefore conclude that d(C) = a(C). o 


By Exercise 3.40 we have an immediate consequence for measurable 
rectangles. 


Corollary 3.57 
The family of measurable rectangles on Q; X Q, satisfies 


d(R) = a(R). 


Exercise 3.41 Show that 
d(I) = a(t), 


where J is the family of open intervals in R. 


The final step in our preparation for the proof Theorem 3.5 will ensure 
that the measure p is uniquely defined on Fi ® Fz by the requirement that 
for all measurable rectangles A, X A2, 


H(A, X A2) = ui (A1)u2(A2). 


Again, we shall phrase the result in terms of general families of sets to 
enable us to use it in a variety of settings. Assume that C is a family of 
subsets of a non-empty set Q. 


Lemma 3.58 

Suppose that C is closed under intersection. If u and v are measures defined 
on the o-field o(C) such that u(A) = v(A) for every A € C and u(Q) = 
(Q) < œ, then (A) = v(A) for every A € o(C). 


Proof Consider the family of sets 
D={A€a(C): WA) = x(A)}. 


Since the measures agree on C we know that C C D. Let us verify that D 
is a d-system. Since u(Q) = v(Q), it follows that Q € D. For any A, Be D 
such that B c A we have 


H(A \ B) = u(A) - WB) = (A) - v(B) = v(A \ B). 


102 Product measure and independence 


(Here it is important that u and v are finite measures.) Hence A \ B € D. 
Moreover, for any non-decreasing sequence A; C Aj;,,; with A; € D for 
i= 1,2,... we have 


u Ù a = lim u(A;) = lim v(A;) = AU a); 
i=l i=l 
which shows that JZ; A; € D. We have shown that D is a d-system such 
that C c D c a(C). Hence d(C) c D c a(C). Because C is closed under 


intersection, it follows by Lemma 3.56 that D = o(C), that is, the measures 
pand y coincide on o(C). oO 


To prove Theorem 3.5 we now need to apply these general results to the 
family R of measurable rectangles on Q; X Qy. 


Theorem 3.5 
Suppose that u; and pp are finite measures defined on the o-fields F and 
f.n, respectively. Then: 

(i) for any A € Fi 8 Fy the functions 


w1 +> [(A,,,), w + p(A,,,) 


are measurable, respectively, with respect to Fi and Faz; 
(ii) for any A € Fı ® F2 the following integrals are well defined and 
equal to one another: 


1 Ho(Ag, du (w) = f Hi (Ao duw); 
Qi Q 
(iii) the function u : Fi ® Fa —> [0, œ) defined by 
MA) = Í H2(Ag, du (w1) = Í Hi(Aw, duw) (3.4) 


for each A € Fi ® Fn is a measure on the o-field Fi ® Fa; 
(iv) pis the only measure on Fi ® Fa such that 
H(A, X A2) = p (A1) (A2) 
for each A, € Fi and Az € Fa. 
Proof (i) Define the family of sets 
D = {A E€ a(R) : wı © po(Ag,) is Fi-measurable}. 


In order to prove that w; +> u2(Aw, ) is Fi-measurable for any A € F18f = 
a(R) it is enough to show that D = o(R). Since po(Aw,) = 14, (w1 )u2(42) 
for A = A X Ao, it follows that R c D. To show that D is a d-system 


3.7 Proofs by means of d-systems 103 


note first that Q; x Q, belongs to R, hence it belongs to D. If A C B and 
A, B € D, then for any w € Q; 


(B \ Ao, = Bu, \Ao and Ao C Boss 
sO 
M(B \ Aju) = H2(Ba,) = H(A), 


where w, > [o(Ajy,) and wı  fo(B,,) are measurable functions. Hence 
@ © po((B \ A).,) is a measurable function, so B \ A € D. Finally, given 
an non-decreasing sequence A; C A;,; with A; € D fori = 1,2,..., we see 
that (Aj), C (Aisi)w, for any w1 € Q), so 


H2 (U a = W2 [Üw = lim p2((Ai).,). 
i=l w 


i=1 


By Exercise 1.19, this means that 2; An € D. We have shown that D is 
a d-system containing R. Hence d(R) c D. By Corollary 3.57 it follows 
that D = o(R), completing the proof. The same argument works for the 
function w }> ui (Aw). 

(ii) For any A € Fi 8 Fù put 


v(A) = { bo(Ag, du (w1), vA) = f [1 (Ag, du2(w2). 
Qı Qz 


The integrals are well defined by part (i) of the theorem. We show that v; 
and v, are finite measures on Fi ® Fo. Let A; € Fi 8 Fo fori = 1,2,... 
be a sequence of pairwise disjoint sets. Then the sections (Aj),,, are also 
pairwise disjoint for any w2 € Q3, and 


vı [Ua]- Í “(Us Janion = Í p (Jado auon 
1 i=l 1 i=1 


i=l = i 


- | Dadada =) | (Aio du lw) 
p i=l VM 


l i=] 
= 5y vı(A;) 
i=l 


by the monotone convergence theorem for a series (Exercise 1.24). More- 
over, vı is a finite measure since 


v1 (Qy X Q2) = py (Q))pue(Q2) < ov, 


104 Product measure and independence 


A similar argument applies to v2. For measurable rectangles A; x A E R, 
where A, € Fi and A> E€ F2, we have 


vi (A, X Az) = u (Aq) u2(A2) = v2(Aq X A2). 


By Lemma 3.58, the measures vy; and v, therefore coincide on Fi ® Fz = 
a(R). 

(iii) Since u = vy = v, we have already proved in part (ii) that u is a 
measure on Fi 8 Fo. 

(iv) Uniqueness of u follows directly from Lemma 3.58. o 


Exercise 3.42 Let X be an integrable random variable on the proba- 
bility space Q = [0,1] with Borel sets and Lebesgue measure. Show 


that if 
f Xdm=0 
(2, j2] 


for any n = 0,1,... and for any i, j = 0,1,...,2” with į < j, then 


f xam=0 
A 


for every Borel set A C [0, 1], and deduce that X = 0, m-a.s. 


The proof of Theorem 3.27 is based on a similar technique, making use 
of d-systems. 


Theorem 3.27 

Random variables X, Y are independent if and only if their joint distribution 
function Fyy is the product of their individual distribution functions, that 
is, 


Fyy(x,y) = Fx(x)FyQ) for any x,y €R. 


Proof Since intervals are Borel sets, the necessity is obvious. For suffi- 
ciency, the claim is that we need only check (3.8) for intervals of the form 
Bı = (—0, x), By = (—00, y). We are then given that for all x,y € R 


P(X <x,Y < y) = P(X < x)P(Y < y). (3.14) 
Now consider the class C of all Borel sets A € A(R) such that for all y € R 


P(X €A, Y < y) = P(X € A)P(Y < y), (3.15) 


3.7 Proofs by means of d-systems 105 
and the class D of Borel sets B € B(R) such that for all A € C 
P(X € A,Y € B) = P(X € A)P(Y € B). (3.16) 


Our aim is to show that C and D are both equal to the Borel o-field B(R). 
By our assumption (3.14), C contains the collection of all intervals (—0o, x), 
and this collection is closed under intersection, we only need to check that 
C is a d-system. This will mean that it contains all Borel sets, and so (3.15) 
holds for all Borel sets. This in turn will mean that D contains all intervals 
(=œ, x), hence to show that it contains (R) we again only need to make 
sure that D is a d-system. 

We now check that D satisfies the conditions for a d-system; the proof 
for C is almost identical. We have Q € D since for all A € C 


P(X € A, Y € Q) = P(X € A) = P(X € A)P(Y € Q). 
If B € D, then 
P(XE€A,Y€Q\B)=P(XEA,Y€Q)-P(XEA,YE€B) 
= P(X € A)P(Y € Q) — P(X € A)P(Y € B) 
= P(X € A)P(Y €E Q\ B). 


Finally, if B, C Bs, with B, € D for all n = 1,2,... and UX, B, = B, 
then 


P(X € A,Y € B) = r(x eA Ye B) 
n=1 


= P(X € A) lim P(Y € B,) 
= P(X € A)P(Y € B). 
Thus D is a d-system. By Proposition 3.56, D is a o-field containing all 


intervals (—co, y), and so contains all Borel sets, which proves that (3.16) 
holds for all pairs of Borel sets, that is, X and Y are independent. o 


4 


Conditional expectation 


4.1 Binomial stock prices 

4.2 Conditional expectation: discrete case 
4.3 Conditional expectation: general case 
4.4 The inner product space L?(P) 

4.5. Existence of E(X |G) for integrable X 
4.6 Proofs 


We turn our attention to the concept of conditioning, which involves ad- 
justing our expectations in the light of the knowledge we have gained of 
certain events or random variables. Building on the notion of the condi- 
tional probability defined in (3.11), we describe how knowledge of one 
random variable Y may cause us to review how likely the various outcomes 
of another random variable X are going to be. We adjust the probabilities 
for the values of X in the light of the information provided by the values 
of Y, by focusing on those scenarios for which these values of Y have oc- 
curred. This becomes especially important when we have a sequence, or 
even a continuous-parameter family, of random variables, and we consider 
how knowledge of the earlier terms will affect the later ones. We first il- 
lustrate these ideas in the simplest multi-step financial market model, since 
our main applications come from finance. 


4.1 Binomial stock prices 


Consider the binomial model of stock prices in order to illustrate the prob- 
abilistic ideas we will develop. This model, studied in detail in [DMFM], 
combines simplicity with flexibility. The general multi-step binomial model 
consists of repetitions of the single-step model. 


106 


4.1 Binomial stock prices 107 


Single-step model Suppose that the current price S (0) of some risky asset 
(stock) is known, and its future price S(T) at some fixed T > 0 is a random 
variable S(T) : Q — [0, +00), taking just two values: 


S(T) = S(O) + U) with probability p, 
W)= S(0) + D) with probability 1 — p, 


where —1 < D < U. The return 


S(T) — S(O) 


K= S) 


is a random variable such that 


K=- U with probability p, 
~ | D with probability 1 — p. 


As the sample space we take a two-element set Q = {U, D} equipped with a 
probability P determined by a single number p € (0, 1) such that P(U) = p 
and P(D) = 1 — p. 


Multi-step model All the essential features of a general multi-step model 
are contained in a model with three time steps, where we take time to 
be 0, h, 2h, 3h = T. We simplify the notation by just specifying the 
number of a step, ignoring its length h. The model involves stock prices 
S (0), S (1), S (2), S (3) at these four time instants, where S (0) is a constant, 
and $(1), S (2), §(3) are random variables. The returns 
E S(n)-S(n- 1) 

S(n-1) 
at each step n = 1,2,3 are independent random variables, and each has the 
same distribution as the return K in the single-step model. For the prob- 
ability space we take Q = {U, D}, which consists of eight triples, called 
scenarios (or paths): 


Q = {UUU, UUD, UDU, UDD, DUU, DUD, DDU, DDD}. 


As Kı, K2, K3 are independent random variables, the probability of each 
path is the product of the probabilities of the up/down price movements 
along that path, namely 

P(UUU) = p°, 

P(DUU) = P(UDU) = P(UUD) = p*(1 - p), 

P(UDD) = P(DUD) = P(DDU) = p(l — p)’, 

P(DDD) = (1 - př. 


108 Conditional expectation 


The emerging binomial tree is recombining, as can be seen in the next 
example. 


Example 4.1 
Let S(O) = 100, U = 0.1, D = —0.1 and p = 0.6. The corresponding 
binomial tree is 


S(0) S(1) Si) S(3) 
133.1 
0.6 
YA 
121 
0.6 
va N 
0.4 
110 108.9 
0.6 0.6 
Vie N f 
0.4 
100 99 
0.6 
N A N 
0.4 0.4 
90 89.1 
0.6 
N VA 
0.4 
81 
\ 
0.4 
(PA!) 


The number X of upward movements in this binomial tree is a random 
variable with distribution 


P(X = 3) = PQUUU)) = p’, 
RPX = 2) = RODUT UDU UUD) = 3p" (Jl = D) 
P(X = 1) = PQ(UDD, DUD, DDU}) SpE py’, 
P(X = 0) = P(DDD)}) = (1 - př, 


that is, X has binomial distribution (see Example 2.2). The expected price 
after three steps is E(S(3)) ~ 106.12. 


We use this example to analyse the changes of this expectation corre- 
sponding to the flow of information over time. 


4.1 Binomial stock prices 109 


Partitions and expectation 


Conditioning on the first step Suppose we know that the stock has gone 
up in the first step. This means that the collection of available scenarios is 
reduced to those beginning with U, which we denote by 


Qy = {UUU, UUD, UDU, UDD}. 


This set now plays the role of the probability space, which we need to equip 
with a probability measure (the events will be all the subsets of Qy). This 
is done by adjusting the original probabilities so that the new probability 
of Qy is 1. For A C Qy we put 
P(A) 
P(Qu) 
Of course Py(Qy) = 1. Since A C Qy, it follows that 
P(A N Qy) 
Py(A) = —— = P(A|Q 
u(A) PQu) (A|Qu), 


so the measure Py is the conditional probability A œ> P(A | Qy) considered 
for subsets A C Qy. 

If we know that the stock went down in the first step, we replace Qy by 
the set Qp of all paths beginning with D, and for A C Qp we put 


P(A) = P(A|Qp). 


Py (A) = 


We have decomposed the set Q = Qy UQp of all scenarios into two disjoint 
subsets Qy, Qp, which motivates the following general definition. 


Definition 4.2 

Let © be a non-empty set. A family P = {B;, Bo,...} such that B; C Q for 
i= 1,2,... is called a partition of Q if B; O B; = Ø whenever i + j and 
Q = Ue, Bi. 


Note that we allow for the possibility that B; = @ when i > n for some n, 
so the partition may be finite or countably infinite. 


Example 4.3 
P = {Qy, Qp} is a partition of Q = {U, DP. 


For any discrete random variable X with values x), x2,... € R the family 
of all sets of the form {X = x;} fori = 1,2,... is a partition of Q. We call it 
the partition generated by X. 


110 Conditional expectation 


Example 4.4 
In Q = {U,D} the partition generated by S(1) from Example 4.1 is 
{Qy, Qp). 


Recall from Definition 3.2 that for any collection C of subsets of Q, the 
o-field generated by C is the smallest o-field containing C. 


Exercise 4.1 Show that the o-field o (P) generated by a partition P 
consists of all possible countable unions of the sets belonging to P. 


Not all o-fields are generated by partitions. For example, the o-field of 
Borel sets B(R) is not generated by a partition. 

Although o-fields are more general, we will work with partitions for the 
present to develop better intuition in a relatively simple case. 


Exercise 4.2 We call A an atom in a o-field F if A € F is non-empty 
and there are no non-empty sets B,C € F such that A = B U C. Show 
that if the family A of all atoms in F is a partition, then F = (A). 


Example 4.5 

Continuing Example 4.1, we compute the expectation of S (3) in the new 
probability space Qy = {S (1) = 110} with probability Py. Since the paths 
beginning with D are excluded, S (3) takes three values on Qy, and this 
expectation, which we denote by E(S (3) | Qy) and call the conditional ex- 
pectation of S (3) given Qy, is equal to 


ECS (3)| Qu) = 131.1 x 0.67 + 108.9 x 2 x 0.6 x 0.4 + 89.1 x 0.47 = 114.44. 


In a similar manner we can compute the expectation of S(3) on Qp = 
{S(1) = 90} with probability Pp, denote it by E(S(3)|Qp) and call it the 
conditional expectation of S (3) given Qp. We obtain 


E(S (3) |Qp) = 108.9 x 0.6? + 89.1 x 2 x 0.6 x 0.4 + 72.9 x 0.47 ~ 93.64. 


These two cases can be combined by setting up a new random variable, 


4.1 Binomial stock prices 111 


denoted by E(S(3)|S(1)) and called the conditional expectation of S$ (3) 
given S(1): 


_ { E(SG)|Qu) on Qy, 
E(S(3)|S(1)) = { E(S(3)|Qp) on Qp, 


_ J 114.44 on {S(1) = 110}, 
7 93.64 on {S(1) = 90}. 


Conditioning on the first two steps Suppose now that we know the price 
moves for the first two steps. There are four possibilities, which can be 
described by specifying a partition of Q into four disjoint sets: 


Quy = {UUU, UUD), 
{UDU, UDD}, 
{DUU, DUD}, 
Qpp = {DDU, DDD}. 


Qup = 


Qpu = 


Each of these sets can be viewed as a probability space equipped with a 
measure adjusted in a similar manner as before, for instance in Qyy we 
have Pyy(A) = P(A | Quyu) for any A C Qyy, and similarly for Pyp, Ppu 
and Ppp. We follow the setup in Example 4.1 to illustrate how this leads to 
the conditional expectation of S (3) given S (2). 


Example 4.6 
We compute the expected value of §(3) on each of the sets 
Quy, Quo, Qnu, Opp under the respective probability: 

E(S(3)| Quy) = 131.1 x 0.6 + 108.9 x 0.4 ~ 123.42, 

E(S (3) | Qup) = 108.9 x 0.6 + 89.1 x 0.4 = 100.98, 

E(S(3)|Qpu) = 108.9 x 0.6 + 89.1 x 0.4 = 100.98, 

E(S (3) |Qpp) = 89.1 x 0.6 + 72.9 x 0.4 = 82.62, 
introducing similar notation for these expectations as in Example 4.5. We 
can see, in particular, that ECS (3) |Qup) = E(S(3)|Qpy). Since S$ (2) has 
the same value on Qyp and Qpy, this allows us to employ a random vari- 
able denoted by E(S (3) | S (2)). To this end, note that the partition generated 


112 Conditional expectation 


by S(2) consists of three sets 
Quyu = {S(2) = 121}, Qup U Qpu = {S(2) = 99}, Opp = {S(2) = 81}, 
and the random variable E(S (3) |S (2)) takes three different values 


ECS (3)| Quv) on Quy, 
ES (3)|$(2)) = 4 E(S(3)|Qup) = E(S(3)|Qpu) on Qup U Qu, 
E(S (3) |Qpp) on Qpp, 


100.98 on {S(2) = 99}, 


123.42 on {S(2) = 121}, 
-| 82.62 on {S(2) = 81}. 


The actual values of S (2) are irrelevant here since they do not appear in the 
computations. What matters is the partition related to these values. 


4.2 Conditional expectation: discrete case 


The binomial example motivates a more general definition. Recall that 
for any event B € F such that P(B) > O and for any A € F we know 
from (3.11) that the conditional probability of A given B is 


P(A N B) 


Exercise 4.3 Show that 
Pg(A) = P(A |B) 


is a probability measure on B defined on the o-field Fg consisting of 
all events A € F such that A C B. 


Definition 4.7 

If X is a discrete random variable on © with finitely many distinct values 
X1,X2,...,X, and B € F is an event such that P(B) > 0, the conditional 
expectation of X given B, denoted by E(X | B), can be defined as the ex- 
pectation of X restricted to B under the probability Pp. 


4.2 Conditional expectation: discrete case 113 


This follows the same pattern as in Examples 4.5 and 4.6, and gives 


n 


1 
BX |B) = 5 X PAX = x} N B) 
i=l 


1 < 1 
= py x;P(1,X = x;) = Pp Be). 


We use this to extend the definition to any integrable random variable X. 


Definition 4.8 
Given an integrable random variable X on Q and an event B € F with 
P(B) > 0, the conditional expectation of X given B is defined as 


1 
E(X|B) = Pp Ete). (4.1) 


Exercise 4.4 For a random variable X with Poisson distribution find 
the conditional expectation of X given that the value of X is an odd 
number. 


As noted in the binomial example, given a partition of Q we can piece 
together the conditional expectations of X relative to the members of the 
partition to obtain a random variable. 


Definition 4.9 
Given a partition P = {B,, Bo,...} of Q, the random variable E(X |P) : 
Q — R such that for each i = 1,2,... 


E(X|P)(w) = E(X| B) if w € B; and P(B;) > 0 
is called the conditional expectation of X with respect to the partition P. 


Note that in this definition the random variable E(X |P) remains unde- 
fined when P(B;) = 0. Since P is a partition, this means that the function 
E(X |P) is well defined P-a.s. 

Applying (4.1), we can write 


1 
E(X|P)= >) Pp E Xs, (4.2) 


i=1,2,... 


P(B;)>0 


The above definition, applied to the partition of Q generated by a discrete 
random variable Y, leads to the following one. 


114 Conditional expectation 


Definition 4.10 

If X is an integrable random variable and Y is a discrete random variable 
with values y1, y2,..., then the conditional expectation E(X | Y) of X given 
Y is the conditional expectation of X with respect to the partition P gener- 
ated by Y, that is, for each i = 1,2,... 


E(X|Y)(w) = EX | {Y =y) if Yw) = y; and P(Y = yj) > 0. 


Exercise 4.5 On [0, 1] equipped with its Borel subsets and Lebesgue 
measure, let Z be the random variable taking just two values, —1 on 
[0, +) and 1 on [4,1], and let X be the random variable defined as 
X(w) = w for each w € [0, 1]. Compute E(X | Z). 


Exercise 4.6 Suppose that Z is the same random variable on [0, 1] as 
in Exercise 4.5, and Y is the random variable defined as Y(w) = 1 — w 
for each w € [0, 1]. Compute E(Y | Z). 


Observe that if Y is constant on a subset of Q, then E(X | Y) is also con- 
stant on that subset. The values of E(X | Y) depend only on the subsets on 
which Y is constant, not on the actual values of Y. For discrete random 
variables Y and Z generating the same partition we always have the same 
conditional expectations, 


E(X|Y) = E(X|Z). 


Exercise 4.7 Construct an example to show that, for random vari- 
ables V and W defining different partitions, in general we have 


E(X|V) + E(X|W),. 


Exercise 4.8 Let X,Y be random variables on Q = ({1,2,...} 
equipped with the o-field of all subsets of Q and a probability mea- 
sure P such that P({n}) = 2x37", X(n) = 2” and Y(n) = (—1)” for each 
n=1,2,... . Compute E(X | Y). 


4.2 Conditional expectation: discrete case 115 


Properties of conditional expectation: discrete case 


We establish some of the basic properties of the random variable E(X | Y), 
where X is an arbitrary integrable random variable (so that E(X) is well- 
defined) and Y is any discrete random variable. 

First, note that conditional expectation preserves linear combinations: 
for any integrable random variables X1, X2 and a discrete random variable 
Y, and for any numbers a,b € R 


(aX, + bX2| Y) = aE(Xı | Y) + DE(X2 | Y). 


This is an easy consequence of the definition of conditional expectation 
and the linearity of expectation. If y1, y2,... are the values of Y, then on 
each set B, = {Y = yn} such that P(B,,) > 0 we have 


1 
(aX, + bX) |Y) = E(aX, + bX2|B,) = PCB, ) dee% + bX>)) 


_ i 
~ P(Bn) 
= aE(X; | Bn) + DE(X2 | Bn) = aE(X) | Y) + DE(X2 | Y). 


(aE(1z,X1) + DEz, X2)) 


Example 4.11 

To illustrate some further properties of E(X | Y), again consider stock price 
evolution through a binomial tree. Starting with S(O) = 100, with returns 
+20% in the first step and +10% in the second, we obtain four values 
S(2) = 132,108, 88,72. (You may find it helpful to draw the tree.) Sup- 
pose that p = 3 at each step. Conditioning $(2) on S(1), we observe 
that E(S(2)|S(1)) is constant on each of the sets Qy = {UU,UD} and 
Op = {DU, DD} in this two-step model, with values 


E(S (2) | Qy) = 132p + 108(1 — p) = 126, 
E(S (2) |Qp) = 88p + 721 — p) = 84. 
Hence E(S (2) |S (1)) equals 126 on Quy and 84 on Qp. The expectation of 


this random variable is ECE(S (2) |.S(1))) = 115.5. This equals E(S(2)), as 
you may check. 


In this example, therefore, the ‘average of the averages’ of S (2) over the 
sets in the partition generated by S(1) coincides with its overall average. 
This is true in general. 


116 Conditional expectation 


Proposition 4.12 
When X is an integrable and Y a discrete random variable, the expectation 
of E(X | Y) is equal to the expectation of X: 


E(E(X | Y)) = E(X). (4.3) 


Proof Let yi, y2... be the values of Y. Writing B, = {Y = y,}, we can 
assume without loss of generality that P(B,,) > 0 for all n = 1,2,... . Then 
> 1g, = 1, and we obtain 


o0 


— 1 
EEXIY) = X EX 1B)PB,) = >, Boys, X)PBr) 
n=1 m 


n=1 
= $” E(1;,X) = E(X). 
n=1 


Note that dominated convergence in the form stated in Exercise 1.34 is 
used in the last equality. m 


Exercise 4.9 Let X be an integrable random variable and Y a discrete 
random variable. Verify that for any B € o(Y) 


E(1,E(X| Y)) = E(1,X). 


Example 4.13 

The two-step stock-price model in Example 4.11 provides two partitions of 
Q = {U, D}’, partition P, defined by S(1) consisting of two sets Qy, Qp, 
and P determined by S (2) consisting of four sets Quy, Qup, Qpy, Qpp (as 
defined earlier). Notice that Qy = Qyy UQyp and Qp = Qpy U Qpp. It 
reflects the fact that S (2) carries more information than $(1), that is, if the 
value of S (2) becomes known, then we will also know the value of S (1). 


This gives rise to the following definition. 


Definition 4.14 

Given two partitions P1, P2, we say P» is finer than (or refines) P4 (equiv- 
alently, that P; is coarser than P2) whenever each element of P, can be 
represented as a union of sets from P2. 


4.2 Conditional expectation: discrete case 117 


Remark 4.15 

The tree constructed in Example 4.11 is not recombining, which results 
in partition P, being finer than P;. In a recombining binomial tree, as 
in Example 4.1, the partition defined by $(2), which consists of the sets 
Quy, Qup UQpy, Qpp, is not finer than that generated by S(1), which con- 
sists of the sets Qy = QyyUQyp, Ëp = Qpy UQpp. In a recombining tree 
S(2) has only three values, and its middle value gives us no information 
about the value of S(1). 


Exercise 4.10 Suppose P, and P, are partitions of some set Q. Show 
that the coarsest partition which refines them both is that consisting of 
all intersections A N B, where A € P; and BE P3. 


Example 4.16 

We extend Example 4.11 by adding a third step with returns +10%. The 
stock price S$ (3) then takes the values 145.2, 118.8, 97.2, 96.8, 79.2, 64.8, as 
you may confirm. (Note that there are only six values as the tree recombines 
in two of its nodes.) The conditional expectation E(S (3) | S (2)) is calculated 
in a similar manner as for E($(2)|S(1)). Its sets of constancy are those 
of the partition P, generated by S$(2), that is, Quy, Qup, Qpy, Qpp. The 
corresponding values of E(S (3) | S (2)) are 138.6, 113.4, 92.4, 75.6. We now 
condition the random variable E(S(3)|S$(2)) on the values of S$ (1): 


E(E(S(3)| S (2))|Qy) = 138.60 x p + 113.40 x (1 — p) = 132.3, 
E(E(S (3) |S (2))|Qp) = 92.40 x p + 75.60 x (1 — p) = 88.2, 
given that p = $. We compare this with the constant values taken by 
E(S(3)|S(1)) on these two sets: 
E(S (3) | Qu) = 145.20 x p* + 118.80 x 2p(1 — p) + 97.20 x (1 - p) 
= 138.60 x p + 113.40 x (1 — p) = 132.3, 
E(S (3) | Qp) = 96.80 x p° + 79.20 x 2p(1 — p) + 64.80 x (1 - p) 
= 92.40 x p + 75.60 x (1 — p) = 88.2. 


Hence E(E(S (3) | $(2))|S(1)) = E(S(3)| S(1)), which is again a particular 
case of an important general result. 


118 Conditional expectation 


Proposition 4.17 (tower property) 
Let X be an integrable random variable and let Y,Z be discrete random 
variables such that Y generates a finer partition than Z. Then 


E(E(X | Y)|Z) = EX |Z). (4.4) 


Proof Let y,,y2,... be the values of Y and z,, z2,... the values of Z. We 
can assume without loss of generality that the sets B; = {Y = y;} and 
C; = {Z = z;} in the partitions generated, respectively, by Y and Z are such 
that P(B;), P(C;) > 0 for each i, j = 1,2,... . Because Y generates a finer 
partition than Z, for any j = 1,2,... we can write C; = U;ez, Bi for some 
set of indices 7; C {1,2,...} . For any w € Cj, by (4.1) and (4.2), 


1 
E(E(X|Y)|Z)(w) = PG, Pe) eee | Y)1c;) 


1 1 
-= lE r PE eds X)1p1c, i 


But B; C C; fori € Jj, so 1g,1c, = 1p,. Moreover, diel; 13, = 1c,. It follows 
that 


1 1 1 1 
zl ENG vin] = BG) a Ppp” OAs) 


1 1 


1 
—E(1c.X) = EX | ZX %). 
= BG) (1c,X) = E(X|Z)(@) 
Once again, dominated convergence in the form stated in Exercise 1.34 is 
used here. o 
Example 4.18 


To motivate the next property, we return to Example 4.16 and consider 
E(S(1)S(3)| Quy). This conditional expectation boils down to summation 
over w € Qyy, but for such scenarios S(1) is constant (and equal to 
120), so it can be taken outside the sum, which gives E(S(1)S (3) | Quy) = 
S(DE(S (3) | Quy). If we repeat the same argument for Qup, Qpy and Qpp, 
we discover that E(S(1)5(3)|S(2)) = S(DE(S (3) |S (2)). In other words, 


4.3 Conditional expectation: general case 119 


when conditioning on the second step, we can take out the ‘known’ value 
S(1). This is again a general feature of conditioning. 


Proposition 4.19 (taking out what is known) 
Assume that X is integrable and Y,Z are discrete random variables such 
that Y generates a finer partition than Z. In this case 


E(ZX |Y) = ZE(X| Y). (4.5) 


Proof Fix a set B belonging to the partition generated by Y such that 
P(B) > 0, and notice that Z is constant on B, taking value z, say. Then 


1 
E(ZX|B E(1,ZX E(z1,X E(1,X) = zE(X|B 
(ZX |B) = PB = pgr- Bp O = ÆXIB) 
so the result holds on B. By gluing together the formulae obtained for each 
such B we complete the argument. o 


The intuition behind this result is that once the value of Y becomes 
known, we will also know that of Z, and therefore can treat it as if it were 
a constant rather than a random variable, moving it out in front of the con- 
ditional expectation. In particular, for X = 1, we have E(Z | Y) = 

The final property we consider here extends a familiar feature of in- 
dependent events, namely the fact that the conditional probability of an 
event A is not sensitive to conditioning on an event B that is independent 
of A, that is, P(A | B) = P(A). 


Exercise 4.11 Prove that E(X | Y) = E(X) if X and Y are independent 
random variables and Y is discrete. 


4.3 Conditional expectation: general case 


Let Y be a uniformly distributed random variable on [0, 1]. Then the event 
B, = {Y = y} has probability P(B,) = 0 for every y € [0,1]. In such 
situations the definition of conditional probability P(A | B,) as — no 
longer makes sense, nor is there a partition generated by Y, and we need a 
different approach. This can be achieved by turning matters on their head, 


defining conditional expectation in the general case by means of certain 


120 Conditional expectation 


properties which follow from the special case of conditioning with respect 
to a discrete random variable or a partition. 

Suppose Y is a discrete random variable and B is a set from the partition 
generated by Y. From Proposition 4.19 we know that 


1,E(X|Y) = E(1,X|Y). 
Applying expectation to both sides and using Proposition 4.12, we get 
E(1gE(X|Y)) = E(eX). (4.6) 


These equalities are also valid for each B € o(Y), since each such B can 
be expressed as a countable union of disjoint sets from the partition gener- 
ated by Y. We also note that the conditional expectation E(X | Y) is o(Y)- 
measurable and does not depend on the actual values of Y but just on the 
partition generated by Y, or equivalently on the o-field o(Y). We could, 
therefore denote E(X | Y) by E(X | o(Y)). 

These observations are very useful in the general case of an arbitrary 
random variable Y, when there may be no partition generated by Y, but we 
do have the o-field o(Y) generated by Y. This gives rise to the following 
definition of conditional expectation with respect to a o-field. 


Definition 4.20 
Let X be an integrable random variable on a probability space (Q, F, P). 
The conditional expectation of X with respect to a o-field G c F is de- 
fined as a random variable, denoted by E(X | G), that satisfies the following 
two conditions: 

(i) E(X |G) is G-measurable; 

(ii) foreach B € G 


EEX |G)) = EX). 


When G = o (Y) for some random variable Y on the same probability space, 
then we shall write E(X | Y) in place of E(X | G) and call it the conditional 
expectation of X given Y. In other words, 


E(X IY) = E(X|c(¥)). 


The first condition is a general counterpart of the condition that the con- 
ditional expectation should be constant on the atoms of G in the discrete 
case, the second extends (4.6). 

At this point we have no guarantee that a random variable with the prop- 
erties (i), (ii) exists, nor that it is uniquely defined if it does exist. We defer 
this question for the moment, and will return to it after discussing the prop- 
erties implied by Definition 4.20. 


4.3 Conditional expectation: general case 121 


Exercise 4.12 On the probability space [0,1] with Borel sets and 
Lebesgue measure compute E(X | Y) when X(w) = lo - 1| and Y(w) = 
lw — 5| for w € [0, 1]. 


Properties of conditional expectation: general case 


All the properties we proved for the discrete case can also be proved with 
Definition 4.20. We summarise them in the following exercises and propo- 
sitions. 


Exercise 4.13 Let X, Y be integrable random variables on (Q, F, P) 
and let G be a sub-o-field of F. Show that, P-a.s., 
(1) E((aX + bY)|G) = aE(X|G) + bDE(Y |G) for any a,b € R (lin- 
earity); 
(2) E(X|G) > 0 if X > 0 (positivity). 


Remark 4.21 
As in Exercise 4.13, identities and inequalities involving conditional ex- 
pectation (as a random variable) should be read as holding up to P-a.s in 
what follows. 


Proposition 4.22 (tower property) 
If X is integrable and H c G, then 


E(E(X|G)|H) = E(X|H). 


Proof Write Y = E(X|G) and take any A € H. We need to show that 
ECAY) = E(LAE(X |H)). By the definition of conditional expectation with 
respect to G (A € G since H c G), we have 


ELAY) = EQ4E(X|G)) = EX). 
By the definition of conditional expectation with respect to H, 
E(1,E(X|H)) = E4X), 
which concludes the proof. Oo 


Corollary 4.23 
For any integrable random variable X 


E(E(X|G)) = E(X). 


122 Conditional expectation 


Proof Since 1g = 1 and Q e G, the definition of conditional expectation 
with respect to G applies: 


E(E(X|G)) = EdoE(X|G)) = EdoX) = E(X). 


Exercise 4.14 Prove the following monotone convergence theorem 
for conditional expectations. 

If X, for n = 1,2,... is a non-decreasing sequence of integrable 
random variables such that Vimy. Xn = X, P-a.s., then their condi- 
tional expectations E(X, |G) form a non-decreasing sequence of non- 
negative integrable random variables such that limy5.0E(Xn|G) = 
E(X |G), P-a.s. 


If Z is a G-measurable random variable, we can ‘take it outside’ the 
conditional expectation of the product XZ; this accords with the intuition 
that Z is ‘known’ once we know G and can therefore be treated like a 
constant when conditioning on G, exactly as in the discrete case. 


Proposition 4.24 (taking out what is known) 
lf both X and XZ are integrable and Z is G-measurable, then 


E(ZX |G) = ZE(X|G). 


Proof We may assume X > 0 by linearity. Take any B € G. We have to 
show that 


E(1gZX) = E(1pZE(X |G)). 
Let Z = 1, for some A € G. Then, since AN BEG, 


E(1gZX) = E(1g14X) = ECuangX) = Elang E(X |G)) 
= Ed gl,E(X|G)) = Ed,ZE(X|G)). 


By linearity, we have E(ZX |G) = ZE(X |G) for any simple G-measurable 
random variable Z. For any G-measurable Z > 0 we use Exercise 4.14 
and a sequence Z1, Z,... of non-negative simple G-measurable functions 
increasing to Z to conclude that E(Z,X |G) increases to E(ZX |G), while 
Z,E(X |G) increases to ZE(X|G). Since E(Z,X|G) = Z,E(X|G) for each 
n, and the limit on the left as n — o is P-a.s. finite by our integrability 
assumption, we have E(ZX|G) = ZE(X|G) as required. For general Z 
we can take positive and negative parts of Z and apply linearity of the 
conditional expectation. o 


4.3 Conditional expectation: general case 123 


Corollary 4.25 
E(Z|G) = Z if Z is integrable and G-measurable. 


Proof Take X = 1 in Proposition 4.24. Oo 


At the other extreme, independence of random variables X, Y means that 
knowing one ‘tells us nothing’ about the other. Recall that random variables 
X,Y are independent if and only if their generated o-fields o (X), 7(Y) are 
independent, see Exercise 3.31. Moreover, recall that X is independent of 
a o-field G c F precisely when the o-fields o(X) and G are independent. 
In that case E(X | G) is constant, as the next result shows. 


Proposition 4.26 (independence) 
If X is integrable and independent of G, then 


E(X |G) = E(x). 
Proof For any B € G the random variables 1g, X are independent, so 
E(X) = Edg)E(X) = E(zE(X)), 


which shows that the constant random variable E(X) satisfies (4.6). Since 
E(X ) is also G-measurable, it satisfies the definition of E(X |G). o 


In the next two results we use the fact that for independent random vari- 
ables or random vectors their joint distribution is simply the product of 
the individual distributions. The first result will be used in the analysis of 
the Black-Scholes model in [BSM]; the second is a special case which 
becomes crucial for the development of Markov processes in [SCF]. Both 
follow easily from the Fubini theorem. 


Theorem 4.27 

Let (Q, F , P) be a probability space, and let G C F be a o-field. Suppose 
that X : Q — R is a G-measurable random variable and Y : Q —> R is 
a random variable independent of G. If f : R? — R is a bounded Borel 
measurable function, then gs : R > R defined for any x € R by 


8s) = E, Y)) = [fo dPy(y) 
R 
is a bounded Borel measurable function, and we have 


EX, Y)|G)) = g;(X), P-a.s. (4.7) 


124 Conditional expectation 


Proof We know from Proposition 3.17 that gẹ is a Borel measurable func- 
tion. If follows that g;(X) is o(X)-measurable. By the definition of condi- 
tional expectation it suffices to show that 


Ells f(X, Y)] = Elgr1c] 


for each G € G. 

By hypothesis, o (Y) and G are independent o-fields. For any bounded 
G-measurable random variable Z the o-field o (X, Z) generated by the ran- 
dom vector (X, Z) is contained in G, hence Y and (X, Z) are independent. 
This means that their joint distribution is the product measure Pyz ® Py 
(see Remark 3.40). Applying Fubini’s theorem, we obtain 


E(f(X, Y)Z) = f fx, y)z d(Pxz 8 Py)(x, z, y) 
R? 


f (f f(x, y)z aPy()] dPyz(x,z) 
R? \JR 


= f g(x)zdPx2(x, 2) 
R 
= E(g,(X)Z). 


Applying this with Z = 1g proves (4.7). Oo 


In the special case where G = o(X) for some random variable X : Q > 
R, the theorem reduces to the following 


Corollary 4.28 

Let (QF, P) be a probability space, and suppose that X : Q — R and 
Y : Q = Rare independent random variables. If f : R? — R is a bounded 
Borel measurable function, then gr : R > R defined for any x € R by 


8%) = Ef, Y)) = [fom dPy(y) 
R 
is a bounded Borel measurable function, and we have 


EX, Y)| o(X)) = g p(X). 


Exercise 4.15 Extend Theorem 4.27 to the case of random vectors 
X,Y with values in R” and R”, respectively, and a function f : R” x 
R” >R. 


4.3 Conditional expectation: general case 125 


Now suppose that Z > 0 is a non-negative random variable on a prob- 
ability space (Q, F , P) such that E(Z) = 1. It can be used to define a new 
probability measure Q such that for each A € F 


QCA) = E(142). 


We know from Theorem 1.35 that Q is indeed a measure. It is a probability 
measure because Q(Q) = E(Z) = 1. 

Since we now have two probability measures P and Q, we need to dis- 
tinguish between the corresponding expectations by writing Ep and Eo, 
respectively. For any B € F we have 


Eo(1g) = Q(B) = Ep(1,Z). 
By linearity this extends to 
Eo(s) = Ep(sZ) 


for any simple function s. Approximating any non-negative random vari- 
able X by a non-decreasing sequence of simple functions, we obtain by 
monotone convergence that 


Eo(X) = Ep(XZ). (4.8) 


Finally, we can extend the last identity to any random variable X integrable 
under Q by considering X* and X~ and using linearity once again. This 
gives a relationship between the expectation under Q and that under P. The 
next result, which will be needed in [BSM], extends this to conditional 
expectation. 


Lemma 4.29 (Bayes formula) 

Let Z > 0 be a random variable such that Ep(Z) = 1 and let Q(A) = 
Ep(14Z) for each A € F. For any integrable random variable X under Q 
and for any o-field G c F 


Eo(X | G)Ep(Z|G) = Ep(XZ|G). 


Proof For any B € G we apply (4.8) and the definition of conditional 
expectation to get 


Ep(1gEp(XZ|G)) = Ep(gXZ) = Eo(1eX ) = Eg(1pEo(X|G)). 


Now we use (4.8) again and then the tower property and the fact that 1, 


126 Conditional expectation 
and Eo(X | G) are G-measurable to write the last expression as 
Eo(1gEo(X|G)) = Ep(gEo(X|G)Z) 
= Ep(Ep(1gEo(X|G)Z|G)) 
= Ep(1pEo(X | G)Ep(Z|G)). 


Since Eo(X | G)Ep(Z|G) is G-measurable, this proves the Bayes formula. 
a 


Conditional density 


When X is a continuous random variable with density fy and g : R —> R 
is a Borel measurable function such that g(X) is integrable, the expectation 
of g(X) can be written as 


E(g(X)) = f 8 Sfx) dm). (4.9) 
R 


For two jointly continuous random variables X, Y we would like to write 
the conditional expectation E(g(X)| Y) in a similar manner. Since the con- 
ditional expectation is a o(Y)-measurable random variable, we need to ex- 
press it as a Borel measurable function of Y. We know that for any Borel 
set B € B(R) 


E(1p(Y)g(X)) = EEX) Y)). 


We can write the left-hand side in terms of the joint density fyy and use 
Fubini’s theorem to transform it as follows: 


E(1p(Y)g(X)) = f 80) fx yx, y) dm (x, y) 


RxB 


= Lf B(x) fy (x, y) amt) dm(y) 


= | { 2 ye ) am) dPy(y). (4.10) 


Dividing by the marginal density fy is all right because fy + 0, Py-a.s., 
that is, for C = {y € R: fy(y) = 0} we have 


Py(C) = [ro dm) = 0 
C 


The fraction appearing in (4.10) is what we are looking for to play a role 
similar to the density fy(y) in (4.9). 


4.3 Conditional expectation: general case 127 


Definition 4.30 

We define the conditional density of X given Y as 

fxy(x, y) 
Fro) 


for any x, y € R such that fy(y) + 0, and put h(x, y) = 0 otherwise. 


h(x, y) = 


This allows us write 
E(1g(Y)g(X)) = e(r [scones Y) an). 
R 


Since h(x, Y) is a o(Y)-measurable random variable for each x € R, it 
follows that h g(x)h(x, Y) dm(x) is o(Y)-measurable. We have just proved 
the following result. 


Proposition 4.31 
If X,Y are jointly continuous random variables and g : R > R is a Borel 
measurable function such that g(X) is integrable, then 


E(g(X)|Y) = f g(ah(x, Y) dm(x), 
R 


where h(x, y) for x,y € R is the conditional density of X given Y. 


Note that this result provides an immediate alternative proof (valid only 
for jointly continuous random variables, of course) of Proposition 4.26: 
if X,Y are jointly continuous and independent, and X is integrable, then 
faray) = fx) fr), so h(x, y) = EED = f(x), hence 


E(X|Y) = [ate Y)dx = TEZO dm(x) = E(X). 
R R 


Exercise 4.16 Let fy y(x, y) be the bivariate normal density given in 
Example 3.16. Find a formula for the corresponding conditional den- 
sity A(x, y) and use it to compute E(X | Y). 


Jensen’s inequality 


The next property of expectation requires some facts concerning convex 
functions. Many common inequalities have their origin in the notion of 
convexity. First we recall the definition of a convex function. 


128 Conditional expectation 


Definition 4.32 
A function ¢ : (a,b) — R, where —co < a < b < œ, is called convex if the 
inequality 

px + = Ay) < G(x) + A -— AGO) 


holds whenever x, y € (a,b) andO <A < 1. 


Such functions have right- and left-hand derivatives at each point in the 
open interval (a, b). We recall some of their properties, including a proof 
of this well-known fact. 

Suppose that x,y,z € (a,b) and x < y < z. Taking A = =; we have 
y = Ax + (1 — 4)z, so the convexity of ¢ gives 


oy) < “909 + 
Z= 7 ames 


x 
(2). 
X 


Rearranging, we get 


Py) — 62) _ 9) - GY) 
Yau ey 
The next exercise shows that the one-sided derivatives of ¢ exist and are 
finite. 


Exercise 4.17 Show that if @ : (a,b) — Ris convex and h > 0 with 
x—h,x+he (a,b), then 
P(x) -a-h ba th) ~ OO) 
h T h ` 
Explain why the ratio 7 [ox + h) — ¢(x)] decreases as h N 0, and is 
bounded below by a constant. Similarly, explain why tg) -ġ(x-h)] 
increases as h N 0, and is bounded above by a constant. 


This exercise shows that the right- and left-derivatives 


ey = lem EtA aan OO = GO —N) 
$, Œ) = lim ~n, GLH) = a (4.11) 


are well defined for each x € (a, b). We also obtain 
P) < p). 
Moreover, for any x < y in (a, b) 


oe AD < #09, 


F = 


g'(x) < 


4.3 Conditional expectation: general case 129 


which ensures that both one-sided derivatives are non-decreasing on (a, b). 
Since ¢ has finite one-sided derivatives at each point, it is a continuous 
function on (a, b). 


Lemma 4.33 

Any convex function ¢ : (a,b) — R is the supremum of some sequence of 
affine functions L, : R > R of the form L(x) = a,x + bn for x € R, where 
an, bn E€ Randn=1,2,.... 


Proof Consider the set of rational numbers in (a,b). It is a countable 
set, which we can therefore write as a sequence q1,q2,... . For each n = 


1,2,... we take any a, € [Ø (qn), &'.(Gn)], put ba = (qn) — anqn and con- 
sider the straight line a,x +b, for x € R. Clearly, ang, +bn = (qn) for each 
n=1,2,..., and it follows from the above inequalities that a,x+b, < 6(x) 
for each x € (a, b) and for each n = 1,2,... . As a result, for each x € (a, b) 


sup (a,x + bn) < P(x). 


REL Z 
Now, for any x € (a,b) we can take a subsequence qj,,q;,,... of the ratio- 
nals in (a, b) such that lim,,_,.. g;, = x. Then, since ¢ is continuous, 
lim (ai, gi, + b;,) = lim (Gi, ) = (x). 
It follows that for each x € (a, b) 


sup (a,x + bn) = P(x). 


REl Za 


This completes the proof. m 


Proposition 4.34 (Jensen’s inequality) 

Let -œ < a < b < œ. Suppose that X : Q — (a,b) is an integrable 
random variable on (Q, F , P) and take a -field G C F. If ġ : (a,b) > R 
is a convex function such that the random variable $(X) is also integrable, 
then 


HEX IG)) < EX) 1G). 


Proof We must show that E(X |G) € (a,b), P-a.s. before we can even 
write #(E(X | G)) on the left-hand side of the inequality. By assumption, 
X >a. The set B = {E(X | G) < a} is G-measurable, so by the definition of 
conditional expectation, 


0 < E(X - a)) = Ed gE - a|G))) = EEX |G) - a)) < 0, 
implying that E(1z(X — a)) = 0. Because, X > a, it means that P(B) = 0, or 


130 Conditional expectation 


in other words, E(X | G) > a, P-a.s. We can show similarly that E(X |G) < 
b, P-a.s. 

By Lemma 4.33, since ¢ is the supremum of a sequence of affine func- 
tions L,(x) = a,x + b,, we have a,X + b, < ¢(X) for each n = 1,2,..., 
hence by the linearity and positivity of conditional expectation (Exercise 
4.13) we obtain 


anE(X |G) + bn < EX) IG) 


for each n = 1,2,..., and taking the supremum over n completes the proof. 
o 


Applying this to the trivial o-field G = {Q, Ø}, we have the following 
corollary. 


Corollary 4.35 
Suppose the random variable X is integrable, ġ is convex on an open inter- 
val containing the range of X and ¢(X) is also integrable. Then 


EX) < EX). 


The next special case is equally important for the applications we have 
in mind. It follows from Jensen’s inequality by taking (x) = x. 


Corollary 4.36 
If X? is integrable, then (E(X | G)? < E(X? |G). 


4.4 The inner product space L?(P) 


We turn to some unfinished business: establishing the existence of condi- 
tional expectation in the general setting of Definition 4.20. We do this first 
for square-integrable random variables, that is, random variables X such 
that E(X?) is finite. 

We shall identify random variables which are equal to one another P- 
a.s. Given two random variables X, Y, observe that if a € R and both 
E(X?) and E(Y?) are finite, then E((aX}) = @ E(X?) and E((X + Y)*) < 
(E(X?) + E(Y?)) are also finite. This shows that the collection of such ran- 
dom variables is a vector space. Moreover, by the Schwarz inequality, see 
Lemma 3.49, |E(XY)| < -VE(X2)E(Y?) is finite too. We introduce some 
notation to reflect this. 


4.4 The inner product space L*(P) 131 


Definition 4.37 
We denote by 


LP) = P(Q, F, P) 


the vector space of all square-integrable random variables on a probability 
space (Q, F, P). For any X, Y € L?(P) we define their inner product by 


(X, Y) = E(XY). 


Remark 4.38 

In abstract texts on functional analysis it is customary to eliminate the non- 
uniqueness due to identifying random variables equal to one another P-a.s., 
by considering the vector space of equivalence classes of elements of L7(P) 
under the equivalence relation 


X~Y ifandonlyif X= Y, P-as. 


We prefer to work directly with functions rather than equivalence classes 
for the results we require. 


We note some immediate properties of the inner product. 
(i) The inner product is linear in its first argument: given a,b € R and 
X,, X2, Y € L’(P), we have (by the linearity of expectation) 


(aX, + bX2, Y) = a(X,, Y) + D(X, Y). 
(ii) The inner product is symmetric: 
(Y, X) = E(YX) = E(XY) = (X, Y). 


Hence it is also linear in its second argument. 
(iii) The inner product is non-negative: 


(X, X) = E(X’) > 0. 


(iv) (X, X) = 0 if and only if X = 0, P-a.s.; this is so because (X, X) = 
E(X’), and E(X?) = 0 if and only if X = 0, P-a.s., see Proposi- 
tion 1.36. 
Since we do not distinguish between random variables equal to one another 
P-a.s., we interpret this as saying that (X, X) = 0 means X = 0, and with 
this proviso the last two properties together say that the inner product is 
positive definite. 
The inner product induces a notion of ‘length’, or norm, on vectors in 
LP), 


132 Conditional expectation 


Definition 4.39 
For any X € L(P) define the L?-norm as 


IXlb = V(X, X) = yE(X°). 


This should look familiar. For x = (x1,...,X,) € R” the Euclidean norm 


n 
II = 4] 09 
i=1 


is related in the same manner to the scalar product 


n 


(x,y) = > Xiyi- 


i=1 


The sum is now replaced by an integral. 
The L?-norm shares the following properties of the Euclidean norm. 
(i) For any X € L?(P) 


X\l, = 0, 
with ||X||, = 0 if and only X = 0, P-a.s. 
(ii) For any a € Rand X € L(P) 
llaXil = lal || X Ih. 
Gii) For any X, Y € L’(P) 
IIX + Yih < [Xl + IVb - 


The first two claims are obvious, while the third follows from the Schwarz 
inequality, using the definition of the norm: 


IX + YIR = E(X + Y)’ = E(X?) + 2E(XY) + EY’) 
< IXIÈ + 211X1k NYl + Ib = (Xb + IYI. 
The Schwarz inequality is key to many properties of the inner product 


space L?(P). First, since a constant random variable is square-integrable, 
the Schwarz inequality implies that 


(E(X|))* = EXD? < IMG X15 = BOX), 


so E(|X|) must be finite if E(X?) is. In other words, any square-integrable X 
is also an integrable random variable, hence E(X) is well-defined for each 
XeL(P). 

Second, the Schwarz inequality implies continuity of the inner product 
and L?-norm. 


4.4 The inner product space L*(P) 133 


Definition 4.40 
We say that f : L?(P) — R is norm continuous if for any X € L7(P) and 
any sequence X),X>,... € L?(P) 


lim |X, — Xl =O implies lim |f(X,) — f(X) > 0. 


Exercise 4.18 Show that the maps X > (X,Y) and X } ||X|l, are 
norm continuous functions. 


For our purposes, the most important property of L?(P) is its complete- 
ness. The terminology is borrowed from the real line: recall that x,, x2,... € 
Ris called a Cauchy sequence if sup, „> Xn — Xml —> 0 as k — oo. The key 
property that distinguishes R from Q is that R is complete while Q is not: 
every Cauchy sequence x), x2,... € R has a limit lim,_,.. x, € R, but this is 
not the case in Q. For example, take any Cauchy sequence r1, r2,... E€ Q of 
rationals with lim)... Fa = v2, which is not in Q. 

The definition of a Cauchy sequence and the notion of completeness also 
make sense in L?(P). 


Definition 4.41 
We say that X), X>,... € L?(P) is a Cauchy sequence whenever 


sup |X, = X mll => 0 as k > oo. 


mn>k 


By saying that L?(P) is complete we mean that for every Cauchy sequence 
X,, Xo,... € L’(P) there is an X € L?(P) such that 


[Xn > Xll >0 as n>, 


Theorem 4.42 
L? (P) is complete. 


The proof makes essential use of the first Fatou lemma (Lemma 1.41 (i)). 
We leave the details to the end of the chapter, Section 4.6. 


Exercise 4.19 Show that any convergent sequence X),X2,... € 
L? (P) is a Cauchy sequence, that is, show that if lim,—.0 ||[Xn — X||2 = 0 
for some X € L?(P), then X;, X2, ... is a Cauchy sequence. 


134 Conditional expectation 


Orthogonal projection and conditional expectation in L7(P) 


The conditional expectation of X € L*(P) with respect to a o-field G c F is 
given in Definition 4.20 as a G-measurable random variable E(X |G) such 
that for all B € G 


E(1gE(X|G)) = Ep). 


We denote the set of all G-measurable square-integrable random vari- 
ables by L7(G, P), and write L?(F , P) instead of L?(P) when there is some 
danger of ambiguity. Then L7(G, P) is an example of a linear subspace of 
L’(F, P), that is, a subset L7(G, P) c L*(F, P) such that X,Y € L(G, P) 
implies aX + bY € L*(G, P) for all a,b € R. The inner product and norm 
for any X, Y € L*(G, P) coincide with those in L?(¥, P) and can be denoted 
by the same symbols (X, Y) and ||X||>. 

Since Theorem 4.42 applies to the family of square-integrable random 
variables on any probability space, we know that L7(G, P) is also complete. 
It is often useful to state this property slightly differently, using the notion 
of a closed set. 


Definition 4.43 
We say that a subset C c L?(F, P) is closed whenever it has the following 
property: for any sequence X1, X,... € C and X € L’(F, P) 


lim |X,- Xl =0 implies X EC. 


Proposition 4.44 

For any o-field G c F, the family L?(G, P) of G-measurable square-inte- 
grable random variables is a closed subset of L*(F , P). 

Proof Suppose that X1, X2,... € L(G, P) and lim, x ||X, — Xia = 0 
for some X € L?(F,P). By Exercise 4.19, it is a Cauchy sequence in 
L’(F,, P). Because the norms in L7(G, P) and L?(F,, P) coincide, it follows 
that X1, X2,...is a Cauchy sequence in L(G, P). Because L7(G, P) is com- 
plete, there is a Y € L(G, P) such that lim, _,.. |[X, — Y||z = 0. To conclude 
that X € L?(G, P) it remains to show that X = Y, P-a.s. This is so because 


0 < |X = Yll = II(X = Xn) = (Y = Xndllo < |X = Xall2 + IY = Xall >0 
as n — œ, hence ||X — Yll = 0. oO 


The analogy with the geometric structure of R” can be taken further. 
Using the centered random variables X. = X — E(X), Y. = Y — E(Y), we 
can write the variance of X as 


Var(X) = EX) = [|Xello 


4.4 The inner product space L*(P) 135 
and similarly for Y. Their covariance is given by 
Cov(X, Y) = E(X. Ye) = (Xe, Yo). 


Thus, if we define the angle 6 between two random variables X,Y in 
L’(F,, P) by setting 
(X, Y) 
cos 0 = ————_ 
IIXIl2 Ile 

(which makes sense as long as neither X nor Y are 0, P-a.s.), we recover 
the correlation between non-constant random variables X, Y € L?(F, P) as 
the angle between the centred random variables X,, Y,: 


(Xe, Yo) 
Py = ae 
X Xda Yells 


In particular, X and Y are uncorrelated if and only if (X,, Y.) = 0. 

Clearly, as defined above, in general we have cos @ = 0 if (X1, X2) = 0. 
It seems natural to use this to define orthogonality with respect to the inner 
product. 


Definition 4.45 
Whenever random variables X, Y € L?(F, P) satisfy (X, Y) = 0, we say that 
they are orthogonal. 


The next two exercises show how the geometry of the vector space 
L?(F, P) reflects Euclidean geometry, even though L?(F, P) is not nec- 
essarily finite-dimensional. 


Exercise 4.20 Prove the following Pythagoras theorem in 
TAF P). 
If X,T € LF, P) and (X,Y) = 0, then 


IX + YÈ = 1X15 + IYI. 


Exercise 4.21 Prove the following parallelogram law in L’(Ff, P). 
For any X,Y € L(F, P) 


IX + YIŻ + IX — YI = 21X15 + 21% Ib. 


136 Conditional expectation 


Exercise 4.22 Show that X,(w) = sinnw and Y,,(w) = cos mw are 
orthogonal in L?[—z, 7] for any m,n = 1,2,... . 


More generally, if X;,X2,...,X, € L°(F,P) are mutually orthogonal, 
then the linearity of the inner product yields 


(> Xi, 2 x) = YX = ya, Xi). 
i=l jel i,j=l i=l 
so that 


oe 


2 n 
= SUX. 
i=l d2 =i 
(With n = 2 we recover the Pythagoras theorem.) 

In R? the nearest point to (x,y,z) in the (x, y)-plane is its orthogonal 
projection (x, y, 0). We can write (x, y, z) = (x, y, 0) + (0, 0, z) and note that 
the vector (0, 0, z) is orthogonal to (x, y, 0), as their scalar product is 0. 

We wish to define orthogonal projections in L?(¥, P) similarly, using 
the inner product (X, Y}. Suppose that M is a closed linear subspace in 
L°(F,, P); that is, M is a closed subset of L?(F , P) such that aX + bY € M 
for any X,Y € M and a,b € R. First we introduce the nearest point in M 
to an X € L?(F, P). It is by definition the random variable Y € M whose 
existence and uniqueness is asserted in the next theorem. 


Theorem 4.46 (nearest point) 
Let M be a closed linear subspace in L?(F, P). For any X € L?(F , P) there 
isa Y € M such that 


IIX — Yl, = inf {|X — Z]l, : Z € M}. 
Such a random variable Y is unique to within equality P-a.s. 


The proof is again deferred to the end of the chapter, Section 4.6. 

Suppose that X € L’(F, P) and let Y be its the nearest point in M in the 
sense of Theorem 4.46. We claim that X — Y is orthogonal to every Z € M. 
Indeed, for any c € R we have Y + cZ € M, hence 


IX -= Yll < |X - (Y + cZ)I|3 = ||X — YI} - 2c(X - Y, Z) +c? |IZIÉ . 


It follows that 2c(X — Y, Z) < ¢? IIZIL3 for any c € R. As a result, —c IIZII3 < 
XX —Y,Z) < cIZIŻ for any c > 0, which implies that (X — Y, Z} = 0, 


proving that X — Y and Z are orthogonal. 


4.5 Existence of E(X |G) for integrable X 137 


The converse is easy to check. 


Exercise 4.23 Let M be a closed linear subspace in L?(¥,, P). Show 
that if Y € M satisfies (X — Y, Zy = 0 for all Z € M, then 


IX- Ylh = inf {IX - Zlk : Z € M}. 


Because of these properties, for any X € L? (F, P) its nearest point in M 
is also called the orthogonal projection of X onto M. 

We already know that L(G, P) is a closed linear subspace in L?(F, P). 
This makes it possible to relate orthogonal projection onto L7(G, P) to con- 
ditional expectation. 


Proposition 4.47 
For any o-field G c F and any X € L*(F, P), the orthogonal projection of 
X onto L?(G, P) is P-a.s. equal to the conditional expectation E(X | G). 


Proof Let Y be the orthogonal projection of X onto L?(G, P). Since Y € 
L*(G, P), it is G-measurable. Moreover, for any B € G we have 1, € 
L’(G, P), so X — Y and 1, are orthogonal, 


0 = (1z, X — Y) = E(1pX) - EBY), 
which means that 
E(1pY) = E(1:X) = EABE(X | 6)). 
We have shown that Y = E(X | G), P-a.s. o 
Because we have established the existence and uniqueness of the or- 
thogonal projection, this immediately gives the existence and uniqueness 
(to within equality P-a.s.) of the conditional expectation E(X | G) for any 


square-integrable random variable X and any o-field G c F. For many 
applications in finance this will suffice. 


4.5 Existence of E(X |G) for integrable X 


In this section we construct E(X | G) for any integrable random variable X 
and o-field G c F. The next result is a vital stepping stone in this task. 

We observed in Exercise 4.18 that, for a fixed Y € L?(P), the linear 
map on L?(P) given by X + (X,Y) is norm continuous. Remarkably, all 
continuous linear maps from L?(P) to R have this form. 


138 Conditional expectation 


Theorem 4.48 
If L: L?(P) > Ris linear and norm continuous, then there exists (uniquely 
to within equality P-a.s.) a Y € L?(P) such that for all X € L*(P) 


L(X) = (X, Y) = E(XY). 
Proof Since L is linear and norm continuous, 
M = {X € °(P): LX) = 0} 


is a closed linear subspace in L?(P). If L(X) = 0 for all X € L’(P), then we 
take Y = 0. Otherwise, there is an X € L7(P) such that L(X) + 0. Let Z be 
the orthogonal projection of X onto M. It follows that X + Z and X — Z is 
orthogonal to every random variable in M. We put 


_ X-Z 
|X - Zll 
and 


U = L(X)E — L(E)X. 
Then L(U) = L(X)L(E) — L(E)L(X) = 0, so U € M. As a result, 
0 = (U, Ey = L(X) —(X, L(E)E). 


Hence Y = L(E)E satisfies L(X) = (X,Y) for all X € L’(P). This proves 
the existence part. 

To prove uniqueness, suppose that V € L7(P) satisfies (X, Y) = (X, V} 
for all X € L?(P). Then (X,Y — V) = 0 for all X € L’(P). Apply this with 
X = Y — V. Then (Y - V, Y — V) = E(Y¥ — V)’) = 0, hence Y = V, P-a.s. by 
Proposition 1.36. m 


The set of all integrable random variables on a given probability space is 
a vector space due to the linearity of expectation. We continue to identify 
X and Y if they are equal to one another P-a.s., and define a natural norm 
on this vector space. 


Definition 4.49 
Let (Q, F, P) be a probability space. We denote by L!(P) = L! (Q, F, P) 
the vector space consisting all integrable random variables, and define 


XI], = E(X) 
for any X € L! (P). We say that ||X||, is the L'-norm of X. 


Like the L?-norm in the previous section, the L'-norm satisfies the fol- 
lowing conditions. 


4.5 Existence of E(X |G) for integrable X 139 


(i) For any X € L! (P) 
XI]; = 0, 


with ||X||, = 0 if and only if X = 0, P-a.s. 
(ii) For any a € Rand X € L! (P) 


aX], = lal IIXIl; - 
Gii) For any X,Y € L'(P) 
IX + Ylh < XI + IV Ih - 


The first two properties are obvious, while the last one follows by applying 
expectation to both sides of the inequality |X + Y| < |X| + IYI. 
In the same manner as for the L?-norm, we can consider Cauchy 

sequences in the L!-norm, that is, sequences X;, X2, ... € L'(P) such that 

sup Xn = Xmlla > 0 as k 7 OO, 

m,n>k 
and define completeness of L'(P) by the condition that every Cauchy 
sequence X1, X2,... € L'(P) should converge to some X € L'(P), that 
is, 

lX,- Xl; ~O0 as n> æ. 


Theorem 4.50 
L! (P) is complete. 


The proof is very similar to that of Theorem 4.42 and can be found in 
Section 4.6. 

Even though L! (P) and L?(P) have some similar features such as com- 
pleteness, the L'-norm does not share the geometric properties of the L?- 
norm, as the next exercise confirms. 


Exercise 4.24 Show that the parallelogram law stated in Exer- 
cise 4.21 fails for the L'-norm, by considering the random variables 
X(w) = w and Y(w) = 1 — w defined on the probability space [0, 1] 
with Borel sets and Lebesgue measure. Explain why this means that 
the L'-norm is not induced by an inner product. 


To compensate for the lack of an inner product in L'(P) we shall use 
a result that comes close to representing a particular linear map on L'(P) 
in a manner resembling the representation in Theorem 4.48 of any norm 


140 Conditional expectation 


continuous linear map on L?(P) by the inner product. To introduce this 
result, we need the following definition. 


Definition 4.51 
Given measures u, v defined on the same o-field F on Q, we write v < u 
and say that v is absolutely continuous with respect to u if for any A €E F 


H(A) =O implies vA) =0. 


Example 4.52 

Any random variable X with continuous distribution provides an example. 
In that case, for any Borel set A € BUR) we have P(A) = i fx dm, where 
fx is the density of X and m is Lebesgue measure. Then Py « m, since 
m(A) = 0 implies Px(A) = f 1,4 fx dm = 0, as follows from Exercise 1.30. 


Example 4.53 

At the other extreme we may consider Lebesgue measure m and the Dirac 
measure ô, for any a € R, defined in Example 1.12 and restricted to the 
Borel sets. We have m({a}) = 0 while 6,({a}) = 1, so 6, is not absolutely 
continuous with respect to m. On the other hand, m(R \ {a}) = œ while 
ôa(R \ {a}) = 0, so m is not absolutely continuous with respect to 6, either. 


If Z € L'(P) is a non-negative random variable such that E(Z) = 1, then 
Q(A) = f ZdP for each A € F defines a probability measure Q on the 
same o-field F as P. It follows that Q « P. The following theorem shows 
that the converse is also true. 


Theorem 4.54 (Radon-Nikodym) 

If P,Q are probability measures defined on the same o-field F on Q and 
such that Q « P, then there exists a random variable Z € L! (P) such that 
for each A EF 


Q(A) = { Zar. 
A 


The proof of this theorem, based on a brilliant argument due to John von 
Neumann, is given in Section 4.6. 


4.5 Existence of E(X |G) for integrable X 141 


Exercise 4.25 Under the assumptions of Theorem 4.54, show that 
the expectation of any random variable X € L'(Q) with respect to Q 
can be written as 


Eo(X) = Ep(XZ). (4.12) 


The right-hand side of (4.12) resembles the inner product of X and Z. 
(We cannot write it as (X, Zy unless we know, in addition, that X,Z € 
L’(P).) This is the result which compensates for the lack of an inner prod- 
uct behind the L'-norm as alluded above. It enables us to establish the 
existence of conditional expectation for any random variable in L!(P). 


Proposition 4.55 
For any o-field G c F and any random variable X € L'\(F,, P), the condi- 
tional expectation E(X | G) exists and is unique to within equality P-a.s. 


Proof First suppose that X is non-negative and E(X) = 1. The probability 
measure Q defined on the o-field G as 


Q(A) = E(14X) foreachAeG 


is absolutely continuous with respect to P (to be precise, with respect to 
the restriction of P to the o-field G, denoted here by the same symbol P 
by a slight abuse of notation). By the Radon—Nikodym theorem, we know 
that there is a random variable Z € L'!(G, P) such that 


Q(A) = E,4Z) foreachA eG. 
We therefore have 
E(X) =Ed,4Z) foreachA €G. 


If X € L'(F, P) is non-negative but E(X) is not necessarily equal to 1, 
then we can apply the above to X = 35 so that E(X) = 1, obtain Z € 
L! (G, P) for X as above, and put Z = E(X)Z. This works when E(X) > 0. 
If E(X) = 0, we simply take Z = 0. Finally, for an arbitrary X € L! (F, P) 
we write X = X* — X`, where X*+, X- € L! (F, P) are non-negative random 
variables, obtain Z* and Z~ for X* and, respectively, X` as above, and take 
ZEL =La 

This enables us to conclude that for any X € L! (F, P) there is a random 
variable Z € L! (G, P) such that 


E(14X) = E(14Z) foreach A €G. 


142 Conditional expectation 


It follows from Definition 4.20 that Z = E(X|G), P-a.s., which proves 
the existence of conditional expectation as well as its uniqueness to within 
equality P-a.s. Oo 


The Radon—Nikodym theorem has much wider application, of course. If 
Q < P, we refer to Z > 0 such that Q(A) = i ZdP for each A € F as 
the Radon-Nikodym derivative (often also referred to as the density) of 
Q with respect to P, and write Z = 2, In finance, the principal application 
occurs when the probabilities P,Q have the same collections of sets of 
measure 0, so that Q « P and P « Q. We then write P ~ Q and say 
that P and Q are equivalent probabilities. An important application of this 
can be found, for example, in the fundamental theorem of asset pricing, 
asserting that the lack of arbitrage is equivalent to the existence of a risk 
neutral probability, see [DMFM] and [BSM]. 

Some elementary relationships between Radon—Nikodym derivatives ap- 
pear in the next exercise. 


Exercise 4.26 Suppose that P, Q, R are probabilities defined on the 
same o-field F. Verify the following conditions. 
(1) IfQ«P,R<« P and 4 € (0,1), then AQ + (1 — A)R « P and 
dAQ + (1 - AR) dQ dR 
——— PE SA p TAA 
(2) If O«< Pand R « Q, then R « P and 
dR _ dR dQ 
dP dQ dP' 
(3) If P ~ Q, then 
dP _(dQ\' 
io (ae) | 


4.6 Proofs 


Theorem 4.42 
L’(P) is complete. 


Proof First, note that if X1, X2,...is a Cauchy sequence in L?(P), then we 


4.6 Proofs 143 


can find nı such that 

Xx — Xillo < : whenever k, l > nı. 
Next, find m > nı such that 

|X, — Xıll2 < = whenever k, l > m, 


and continue in this fashion to find a sequence of natural numbers nı < 
m <--- such that for each i = 1,2,... 


1 
|X; — Xilə < zi whenever k, l > nj. 
In particular, for every i = 1,2,... 


E(|X, 


N+] 


= Xan D < IX, 


Ti+] 


1 
B Xnll2 < 2i 


This means that, starting with a Cauchy sequence in the L’-norm, we have 
a subsequence X,,,, X,,,... for which 


1 
E(|Xna — Xml) < 5 foreachi=1,2,... . 


Since Y; = Lem — Xn is a non-negative F -measurable function on Q, 
the monotone convergence theorem, applied to the partial sums >", Y; 


ensures that 
2(>, | = KY) <1. 
i=l 


i=1 


This means that P-a.s. the series X=; |X,,,,, — Xn, | converges in R, hence 
P-a.s. the series ))* (Xn,,, — Xn,) converges absolutely, and so, P-a.s., it 
converges in R. We put 


li+1 


X= Xn, F > a T Xn) = lim Xn, 
i=1 


on the subset of Q on which 3°}? \(X;,,, — Xn,) converges, and X = 0 on the 
subset of P-measure 0 on which it possibly does not converge. 

Finally, we must show that X, also converges to X in L?-norm. First note 
that 


IX; — XP = lim |X; - X, 


* = liminf |X; — X,,{ 


144 Conditional expectation 
So we can apply Fatou’s lemma to obtain 


IX, - XI = E (iim inf |X, — X,, 


j < liminf E (Ix aX 


) 


where the last step employs the fact that X1, X2, ... is a Cauchy sequence 
in the L?-norm. o 


= lim inf ||X; =<] >0 as ko oœ, 


Theorem 4.46 (nearest point) 
Let M be a closed linear subspace in L?(F , P). For any X € L?(F , P) there 
isa Y € M such that 


IX- Ylh = inf {IIX - Zll : Z € M}. 
Such a random variable Y is unique to within equality P-a.s. 


Proof Let 
6 = inf{||X — Z||, : Z € M}. 


There is a sequence Y;, Y2,... € M such that 6 < ||X — Yıla < ô+ 1 for 
each k = 1,2,... . We will show that the Y, form a Cauchy sequence in 
the L?-norm and then use completeness and the fact that M is closed to 
obtain Y € M as a limit of the sequence Y,,. 

The parallelogram law (Exercise 4.21), applied to Y,,— X and Y, — X for 
any m,n = 1,2,..., provides that 


IY, ale Ya = 2x2 + IY, E A = 2 |Y, - XÈ + 2 (Yin = Xi . 


Now, ||¥, — XI —> 6? and ||Y,, — XI — 6? as m,n — œ. Moreover, 
[Yn + Ym — 2X\|3 — 467 as m,n — co because 3(¥;, + Yn) € M and 


1 1 
26 < ||¥n + Ym — 2Xll2 < IYn — Xllo + [¥n — Xl < 26+ - + —. 
n m 


This means that ||Y,, — Yale — 0 as m,n — œ, showing that Y1, Y2,... is 
a Cauchy sequence. By completeness, the sequence converges in the L?- 
norm to a random variable Y € L?(F, P) and, since M is closed, Y € M. 
Finally, the continuity of the L?-norm shows that ||X — Y;||, > IIX — Yll as 
k — œ, and this means that ||X — Y||, = ô. 

To see that Y is unique, take any W € M such that ||X — W||, = 6. Using 
the parallelogram law with Y — X and W — X we then have 


IY + W- 2X15 + IY - WIR = 2 1Y — X12 + 21W - XIE = 48, 


while, since ¿(Y + W) € M, it follows that ||Y + W- 2X|} > 46, so 
IY — WII = 0 and therefore Y = W, P-a.s. o 


4.6 Proofs 145 


Theorem 4.50 
L! (P) is complete. 


Proof The argument in the proof of Theorem 4.42, which shows that 
L’(P) is complete, can be repeated in the case of L! (P), with the L'-norm 
instead of the L?-norm and with the squares dropped in the final para- 


graph. Oo 


Theorem 4.54 (Radon-Nikodym) 

If P,Q are probability measures defined on the same o-field F on Q and 
such that Q < P, then there exists a random variable Z € L'(P) such that 
for each A EF 


Q(A) = 1 ZdP. 
A 
Proof Consider a third probability measure defined for each A € F as 
1 1 
R(A) = =Q(A) + =P(A). 
(4) = 5A) + 5P) 


By the Schwarz inequality, see Lemma 3.49, for any X € L?(R) 
|Eo(X)| < Eo(IXI) < 2Ee(X) = 2Er(1 IXI) 


< 2 JERODERIXP) = 2 ERIX) = 21Xlh.r. 


where ||X|lz R denotes the norm in L?(R). This means that L : L?(R) > R 
defined as L(X) = 1Eo(X) for each X € L?(R) is a norm continuous linear 
map on L?°(R). Therefore, by Theorem 4.48, there is a U € L?(R) such that 


1 
5Eo(X) = Ex(XU) for each X € L?(R). 


Since R = }Q + 4P, this can be written as 
Eo(X(1 - U)) = Ep(XU) for each X € L?(R). (4.13) 


Applying this to X = 1, for any A € F gives $Q(A) = Er(1,U), and since 
0 < 4Q(A) < R(A), we have 


0 < Er(1,U) < R(A). 


Because this holds for any A € F, it follows that 0 < U < 1, R-a.s. This 
in turn implies that O < U < 1, P-a.s. and therefore also Q-a.s. Moreover, 
taking X = 1,y=1; in (4.13), we get 


0 = Eg(Ayy=n0 — U)) = Ex(liv-1,U) = P(U = 1), 


146 Conditional expectation 


and since Q « P, we also have Q(U = 1) = 0. This means that O < U < 1, 
P-a.s. and Q-a.s. 

We put Y, = 1 + U + U? +---+U". For any A € F, taking X = 1,Y,, 
which belongs to L?(R) because U is bounded R-a.s. and therefore Y, is 
bounded R-a.s., we get from (4.13) that 


Eg(14(1 — U"*")) = Eo(14¥Y,(1 — U)) = Epa ¥nU) = Epa — 1). 


Since 0 < U < 1, Q-a.s., it follows that 1 — U"*! is a Q-a.s. non-decreasing 
sequence with limit 1. Moreover, since 0 < U < 1, P-a.s., it follows that 
Yn+1 — 1 is a P-a.s. non-decreasing sequence, whose limit we denote by Z. 
By monotone convergence, Theorem 1.31, this gives 


O(A) = Eo(14) = Ep 4Z), 


completing the proof. Oo 


5 


Sequences of random variables 


5.1 Sequences in L7(P) 

5.2 Modes of convergence for random variables 
5.3 Sequences of i.i.d. random variables 

5.4 Convergence in distribution 

5.5 Characteristic functions and inversion formula 
5.6 Limit theorems for weak convergence 

5.7 Central Limit Theorem 


Although financial markets can support only finitely many trades, finite se- 
quences of random variables are hardly sufficient for modelling financial 
reality. For instance, to model frequent trading we might consider the bi- 
nomial model with a large but finite number of short steps. However, it 
would be rather restrictive to place an arbitrary lower bound on the step 
length. We prefer to consider infinite sequences of random variables (and 
in due course families of random variables indexed by a continuous time 
parameter, as in [BSM]). In doing so we need to be aware that convergence 
questions for random variables are more complex than for a sequence of 
numbers. 


5.1 Sequences in L’(P) 


Continuing a theme developed in Chapter 4, we study sequences of square- 
integrable random variables. The properties of the inner product allow us 
to construct families of mutually orthogonal random variables, which can 
play a similar role as an orthogonal basis in a finite-dimensional vector 
space. Then we move our attention to approximating square-integrable ran- 


147 


148 Sequences of random variables 


dom variables on [0, 1] by sequences of continuous functions, a useful re- 
sult because of the familiar properties of continuous functions. 


Orthonormal sequences 


Recall from Definition 4.45 that X,Y € L?(P) are called orthogonal if 
(X,Y) = E(XY) = 0. This leads naturally to the notion of an orthonor- 
mal set, that is, a subset of L?(P) whose members are pairwise orthogonal 
and each has L?-norm 1. A natural question arises how to approximate 
an arbitrary element of L?(P) by linear combinations of the elements of a 
given finite orthonormal set. 


Proposition 5.1 
Given Y € L’(P) and a finite orthonormal set {X,,X>,...,Xn} in L?(P), the 
norm ||Y— 07, a;X;ll2 attains its minimum when a; = (Y, X;) fori = 1,...,n. 


Proof By definition, linearity and symmetry of the inner product, and 
since the X; are orthonormal, 


2 


= (r — aX, Y- Ya) 
2 


i=l j=l 


= |Y -2 5 ailXi, Y) + 3 y aia;(Xi, Xj) 
i=1 


i=l j=l 
=M -2 9 aX, ¥) + Y a; 
i=1 {=l 


= IVb + X la} - 2a:(X;, Y). 
i=1 


n 


Y- ` a;X; 


i=1 


Note that for each i 
la; — (Xi, Y) = a; — 2aj{X;, Y) + (Xi, H 


so that in each term of the sum on the right we can replace a; — 2a;{X;, Y) 
by [a; — (X;, YX]? — (X;, YY. In other words 


z 2 
Y= DS ax, 


=I- $ XYY + Jla- YP, 6D 
i=1 2 i=1 


i=1 


and the right-hand side attains its minimum if and only if a; = (X;, Y) for 
each i. oO 


5.1 Sequences in L?(P) 149 


This choice of coefficients leads to a very useful inequality when we 
let n — œ and consider an orthonormal sequence, that is, a sequence 
X\,X2,... € L?(P) of random variables with ||X;ll = 1 for all i = 1,2,... 
and with (X;,X;) = 0 for all i, j = 1,2,... such that i + j. 


Corollary 5.2 (Bessel inequality) 
Given Y € L?(P) and an orthonormal sequence X,,X,... € L?(P), we 
have 


SUX YY < IYI. (5.2) 
i=l 
Equality holds precisely when yy" (Xi, Y)X; converges to Y in L?-norm, 
i.e. when |Y — X3- (Xi, Y)Xill2 > 0 as n > œ. 


Proof Take X,,...,X, from the given orthonormal sequence. Putting a; = 
(X;, Y) in (5.1), we can see that 


O< S Y)X; 


i=1 


2 n 
=IXIB- SX, YY. 
2 


i=1 


Thus ||Y|[5 is an upper bound for the increasing sequence of partial sums 
Z1 (Xi, YY, hence also for its limit X2, (Xj, YY. 

The identity 
2 n 

= 171$- DX, Y? 


2 i=1 


Y-Y, YX, 


i=1 


holds for each n, and so, if the partial sums };;-;(X;, Y)X; converge to Y in 
L?-norm, then 
2 


0 = lim 


noo 


Y- YYA, 
i=1 


= [IVI - lim X X% Y}, (5.3) 
2 i=1 
which shows that X2, (X;, YY = |IYI5 . 
Conversely, if we have equality in (5.2), then the right-hand side of (5.3) 
is 0, and since the left-hand side is also 0, it means that 57, (X;, Y)X; con- 
verges to Y in the L?-norm as n > ov, m| 


In the Euclidean space R”, the standard orthonormal basis e1, é2,... €en 
provides the representation x = };-; xje; with (x,e;) = x;, for each x = 
(x1, X2,- .-, Xn) E€ R”. The basis is maximal in the sense that we cannot add 
further non-zero vectors to it and retain an orthonormal set. This idea can 
be used to provide an analogue for a basis in L?(P). 


150 Sequences of random variables 


Definition 5.3 
We say that D c L?(P) is a complete orthonormal set whenever 


1 ifX=Y, P-as. 
0 otherwise 


an= 


for any X, Y € D, and (X, Z) = 0 for all X € D implies that Z = 0, P-a.s. 


The case when there is a countable complete orthonormal set {E;, F>,...} 
is of particular interest. We then say that E1, E2,... is a complete or- 
thonormal sequence or orthonormal basis. 

Given a sequence a), a2, ... € R such that XZ; a? converges, the partial 
sums Y, = 2, aE; satisfy ||¥, — Ynll = Shinai a (by Pythagoras), and 
this becomes arbitrarily small as m,n — oo. So Y1, Y2,... is a Cauchy se- 
quence in the L?-norm, and therefore by Theorem 4.42 there is a Y € L?(P) 
such that 


n 


Y- X aE; 


i=1 


>00 asn—- oo. (5.4) 
2 


We define the sum of the infinite series °°, a;E; in L’-norm as XX; a:E; = 
Y (to within equality P-a.s.) whenever (5.4) holds (see also Remark 5.10). 
In particular, when a; = (Y,£;), the Bessel inequality (5.2) ensures that 
Ye, a? < |IYI < co. This yields a representation of Y analogous to that 
for a basis in a finite-dimensional vector space. 


Theorem 5.4 
Given a complete orthonormal sequence E,,E,... € L*(P), every Y € 
L*(P) satisfies 


Y= J (E)E (5.5) 
i=1 
This is known as the Fourier representation of Y. The (Y, £;) are called 
the Fourier coefficients of Y relative to the complete orthonormal se- 
quence E1, F>,... . 


Remark 5.5 

The classical representation of functions by their Fourier series uses (5.5) 
in the case of Q = [-z.2] with Borel sets and uniform probability P = 
+m-rap where my_7,,1 is the restriction of Lebesgue measure to [—z, 7], 
and with the sequence of functions 


1 cos nt sin nt 
E) = —=, Em = > Enlt) = 
OOO væ Ni vi 


5.1 Sequences in L*(P) 151 


for each t € [-z,z] and n = 1,2,... . We are not going to prove the com- 
pleteness of this well-known sequence, but focus instead on an example 
which has direct applications in stochastic calculus. 


Proposition 5.6 (Parseval identity) 
An orthonormal sequence E,, Ey, ... € L?(P) is complete if and only if for 
each Y € L?(P) 


IVIg = S EE (5.6) 
n=! 


Proof If Y € L*(P) and Ey, E>,... € L’(P) is a complete orthonormal 
sequence, then (5.5) holds, so 


IYIĖ = (Si E)E, SY, EE) = Sy, EY (E; Ei) = Sy, Ey 
i=1 i=1 i=1 i=1 


since (E;,E;) = 1 if i = j, and 0 otherwise. Conversely, if (Z, E;) = 0 for 
each i = 1,2,..., then (5.6) implies that IIZII5 = 0, hence Z = 0, P-a.s., so 
E, E,...is a complete orthonormal sequence. Oo 


Exercise 5.1 Show that if E, E2, ... € L’(P) is a complete orthonor- 
mal sequence, then for any X, Y € L?(P) 


(X,Y) = $ (Z EXT, Ei). 


i=0 


Example 5.7 
Let Q = [0,1] with Borel sets and Lebesgue measure. A complete or- 
thonormal sequence is given by the Haar functions: 


Ho = 1, 
H, = 271, A — 23 la zaj forn=1,2,..., 


aj+i > oj+l Spa sj 
where j = 0,1,... and k = 0,1,...,2/ — 1 are such that n = 2/ + k. The 
Haar functions are useful, for example in the construction of the Wiener 
process, see [SCF]. 

The Haar functions form a complete orthonormal sequence. The calcu- 


lations showing that these functions are orthogonal to one another and each 


152 Sequences of random variables 


has L?-norm 1 are left as Exercise 5.2 below. We show that this sequence 
is complete. Suppose that a square-integrable random variable X on [0, 1] 
is orthogonal to every member of the sequence of Haar functions. We need 
to show that X is zero m-a.s. 

To do this, we first show by induction on j = 0,1,... that 


ra= f _ Xdm=0 for each k = 0,1,...,2/-1. (5.7) 


(55 


Since X is orthogonal to Ho, 
0 = (X, Ho) = i Xdm=tf, 
(0,1) 


so (5.7) is true for j = 0. Now suppose that (5.7) holds for some j = 
O:1,.... Then for any k= 0,1,...,27— 1 


oswa | Xdm 


2? 2 


= Xdm+ Xdm= Dinak F Diiak+1 
( 2k a] (44 zz] 


2j+l > 2j+I 2j+l > itl 


and 


0= X un) =2 f xm- f mee 
2k 2k+l (24 2k+2 


Ta Aaa 
=2? (Diigak — Dainty resi) - 

It follows that 
bim = bm =O foreachk =0,1,...,27-1, 


completing the induction argument. As a result, 


for each j = 0,1,... and each k,/ = 0,1,...,2/ such that k < J. By Ex- 
ercise 3.42 we can conclude that X = 0, m-a.s. This shows that the Haar 
functions form a complete orthonormal sequence. 


5.1 Sequences in L*(P) 153 


Exercise 5.2 Verify that the Haar functions H, for n = 0,1,... form 
an orthonormal sequence, as claimed in Example 5.7. 


Approximation by continuous functions 


From the construction of Lebesgue integral we know that functions that are 
integrable, and therefore also square-integrable, can be approximated by a 
sequence of simple functions. Here we consider the special case of square- 
integrable functions on Q = [0, 1], equipped with Borel sets and Lebesgue 
measure, and show that they can be also be approximated by a sequence of 
continuous functions. 


Lemma 5.8 

For every square-integrable function f on Q = [0,1] (with Borel sets and 
Lebesgue measure) there is a sequence of continuous functions f, on [0.1] 
approximating f in the L?-norm, that is, 


If- fill, 20 as n> oœ. 


Proof It suffices to show that for every square-integrable function f on 
[0, 1] and for every € > 0 there is a continuous function g defined on [0, 1] 
such that ||f — gll < €. 

First, take f(x) = 1(a»)(x) for any x € R, with a,b € R such that a < b. 
For any € > 0 put 


x-ate b-xt+eE 
g(x) = ~z CG F Lap (x) + eT opre) 


for each x € R, which defines a continuous function. We denote the restric- 
tions of f and g to [0, 1] by the same symbols f, g. Then 


If- ells = f (f-gy dm< [o-s dm 
[0,1] R 


=| gdm | g dm 
[a-e,a] [b,b+e] 
a 7x-at+e\ we hag bey 2 
a (==) ax | (==) BE san 
a—E E b E 3 


Next take f = 1, for a Borel set A c [0,1], and let € > 0. By Defini- 
tion 1.18, there is a countable family of open intervals J1, J2,... such that 


154 Sequences of random variables 
A C I = Uki Jk and 
e\2 
m(A) < m(I) < m(A) + (=) 
hence 
gi? 
Ila — Llf < mU \ A) = mI) - m(A) < (=) 


We can take the J; to be pairwise disjoint (otherwise any overlapping open 
intervals in this family can be joined together to form a new countable 
family of pairwise disjoint open intervals J; such that J = Uj, Jj). Let 
Ix = UX, Jų. Because the series 1°, m(J;,) = m(I) < œ converges, there 
is a K such that 


[tr - tal = mE \ Fd) = m(Jk) < EE 


k=K+1 


We already know that for each k = 1,2,..., K there is a non-negative con- 
tinuous function g; such that 


E 
[u-ads 
Putting g = gı +--+- + gg, we have 
Arc — alle < [Hla ~ gill, +--+ a nE 
It follows that 


If- gl = |da - 1) +r - tie) + Ar- D) 
< Ilta = th + [llr - tall + [t-g 


SE. 


Next take any non-negative square-integrable function f on [0,1]. By 
Proposition 1.28 there is a non-decreasing sequence of non-negative simple 
functions s, such that lim,_,.. Sn = f. It follows that lim, (f — s) =0 
and 0 < f —s, < f, so by dominated convergence, see Theorem 1.43, we 
have 


If- sul sf Gea esi ee. 
[0,1] 


This shows that for any € > O there is a non-negative simple function s 
such that 


E€ 
= apie 
If- sl < 5 


N 


Writing the simple function as s = },„-1 anla, for some a, > 0 and some 


5.1 Sequences in L*(P) 155 


Borel sets A, C [0,1], we know that for each n = 1,2,...,N there is a 
non-negative continuous function g, such that 


|L, — 8n 


E 
< =——. 
2Na, 


Putting g = g; +-+- + gy, we get 


lis — gll = lla (da, = 81) +- + aw ay - 


< ai [f(a = all, +--+ aw [lay = evlh 
E E 


E 
ee! eee a 
SaNa, N Nay 2 


It follows that 


IZ- elk < I - slh + lls — gl < = 3 == =~. 


Finally, for an arbitrary square-integrable function f on [0, 1], we can 
write itas f = f*—f~, where f*, f~ are non-negative and square-integrable. 
Then for any € > 0 there are non-negative continuous functions g*, g7 such 
that 


+ + é : = E 
[r-ei I-es, 


and for g = gt — g` we have 


If - all < [le -ehl 
completing the proof. Oo 


The following exercise, which can be solved by applying Lemma 5.8, is 
used when constructing stochastic integrals in [SCF]. 


Exercise 5.3 Let a < 0 < 1 < band let f be a Borel measurable 
function on [a, b] such that fa | f? dm < œ. Show that 


lim f (f(x) — f(x +h)? dm(x) = 0 
h-0 [0,1] 

Hint. Approximate f by continuous functions in L?-norm and use the 
fact that every continuous function on a closed interval is uniformly 
continuous. 


156 Sequences of random variables 
5.2 Modes of convergence for random variables 


The partial sums of the Fourier representation of X € L?(P) provide an 
example of a sequence converging to X in L?-norm. We now explore how 
the notion of convergence familiar from Euclidean space R” may be gen- 
eralised to define several distinct modes of convergence for sequences of 
random variables defined on the same probability space. We will describe 
relationships between four distinct modes of convergence for random vari- 
ables: 
(i) convergence in L?-norm; 

(ii) convergence in L'-norm; 

(iii) convergence P-almost surely; 

(iv) convergence in probability. 


By way of contrast, for a sequence x, x®,... € R” the idea of con- 
vergence is quite unambiguous: x = GC a ses xf converges to x = 
(X1,%2,.--,X,) as k — œ if and only if x® — x; for each i = 1,...,”; in 


other words, we have convergence for each coordinate. Convergence in R” 


can also be captured in terms of the norm ||x||, = 4 | ye a or the norm 
ixl = | |x| defined for each x € R”. 


Exercise 5.4 Show that the following conditions are equivalent: 
(1) Fay —> x; as k > o for each i = 1,...,n; 
(2) [e -dlh = VEL - x)? > 0 as k > 00; 
(3) [|x — all, =J ie — x; > Qas k > œ. 
Since any x = (x1,...,Xn) € R” can be regarded as a function from 
{1,2,..., n} to R, assigning x; to each i = 1,...,n, the analogy between 


the above norms defined on R” and the L?-norm and L!-norm considered 
in Chapter 4 is apparent. We now define convergence in these norms for 
sequences of random variables on a probability space (Q, F, P). 


Definition 5.9 
We say that X„ converges to X in L?-norm and write X, A X if 


IX, - XIÈ = E(X, - X?) > 0 


asn — oo. 


5.2 Modes of convergence for random variables 157 


Remark 5.10 

In Section 5.1 the sum of a series )*, a;E; with a; € E and E; € L?(P) 
for i = 1,2,... was defined as a random variable Y € L?(P) such that (5.4) 
holds. It terms of convergence in L?-norm this simply means that 


Definition 5.11 

We say that X,, converges to X in L'-norm and write X, z X if 
|X, E XIh = E (Xp = X) > 0 

as n — œ. 


Recall that for any Z € L?(P) the Schwarz inequality (Lemma 3.49) 
implies EIZ < E(Z’) since E(1) = 1. Hence we have the following 
inequality between the two norms: 


Zl, < IlZIl2 - (5.8) 


This means that if ||X,, — XIl> — 0, then also ||X,, — X||; — 0 as n — ©, so 
convergence in L?-norm implies convergence in L!-norm. The converse is 
false in general, as the next example shows. 


Example 5.12 

Ror Gade i = Ih, Zoo. IGM, = aly. wine JAVA.) = =p Then X, L (0) 
since E(|X,|) = nP(A,) = aT — 0. But X, does not converge to 0 in L?- 
norm since E(X?) = n?P(A,) = Vn > œ as n > o. 


In further contrast to the situation in R”, convergence in either of these 
norms is not the same as ‘coordinatewise’ convergence. For a sequence of 
random variables X, the natural analogue of coordinatewise convergence 
to a random variable X is pointwise convergence, where X = lim, Xn 
means that, for every w € Q, the real numbers X,,(w) converge to the real 
number X(w). Recall, however, that random variables are identified if they 
are equal to one another P-a.s. This leads to the following definition. 


Definition 5.13 

We say that X„ converges to X almost surely and write X,, 25 X if there 
is an A € F with P(A) = O0 such that X,(w) > X(w) as n — œ for all 
weEQ\A. 


158 Sequences of random variables 


Example 5.14 

Let Q = [0,1] with Borel sets and Lebesgue measure. The sequence 
X,(W) = w” converges to X(w) = 0 for all w € [0, 1), but not for w = 1. 
Since the singleton {1} has Lebesgue measure 0, it follows that X, 0) 


One obvious question is whether convergence in L'-norm or L?-norm 
implies convergence almost surely. The following counterexample shows 
that we cannot always expect this. 


Example 5.15 
On Q = [0, 1] with Borel sets and Lebesgue measure, construct X; = 1,0,1), 
MG = 119.1), X = Liia X = 191), X; = Lii XG = 1,13), X = 1,3.» 


1 2 
Xo = 19,4), and so on. We have X, 2 0 and X, Z 0 because ||X,,||, < 
\[Xnll2 — O since the lengths of the intervals tend to 0. However, for each 
w € [0, 1) there are infinitely many n such that X,(w) = 1, so X,(w) does 
not converge to 0 for any w in [0, 1). 


The next example shows that convergence almost surely does not neces- 
sarily imply convergence in either norm. 


Example 5.16 

A sequence converging almost surely but failing to converge in L'-norm is 
built on [0,1] with Borel sets and Lebesgue measure by setting X,(w) = 
nlio, 1;(@). Clearly X,(w) — 0 for each w € (0, 1] since then X, (w) = Oif n 


is large enough. Hence X, 25 0. On the other hand, E(X,,) = 1 for all n, so 
X,, fails to converge to 0 in L!-norm and therefore also in L?-norm. 


We can, however, make progress by imposing additional conditions. For 
example, the dominated convergence theorem (Theorem 1.43) can now be 
stated as follows. 


Theorem 5.17 
ce is a random variable Y . L! (P) such that |X,| < Y for all n, and 


Xn = X, then X € L\(P) and X, = X. 


5.2 Modes of convergence for random variables 159 


We can characterise convergence almost surely by considering the set of 
all w € Q where, for infinitely many n, the values X,,(w) and X(w) differ by 
more than some given £ > 0. To make the notion that some phenomenon 
occurs infinitely often more precise, observe that Exercise 1.33 suggests 
an analogue for events of the liminf of a sequence of real numbers. This 
motivates the following terminology. 


Definition 5.18 
Given a sequence A4, A2,... € F in some o-field F, define 


lim inf A,= G a An; 


n=] m=n 


lim sup A, = a Ù Am. 


n=] m=n 


Note that liminf,_,.. A, and lim sup,_,,, A, both belong to F. 


We say that w is in A, infinitely often if w € limsup, ,,, An. For any 
such w there are infinitely many n such that w € A,. Similarly, we say that 
w is in A, eventually if w € liminf,_,.. An. For any such w we can find an 
integer k such that w € A, for all n > k. 

Our main interest is in lim sup,_,,, An. The next exercise applies the sec- 
ond Fatou lemma (Lemma 1.41 (ii)) to a sequence of indicator functions of 
sets (compare with Exercise 1.33). 


Exercise 5.5 Show that for any sequence of events A1, A2,... E F 


P(lim sup A,,) > lim sup P(A,). 


noo n= 


Applying de Morgan’s laws twice, we obtain 


Q\ (im sup} =Q À u An 
aa n=1 m=n 


o0 


UJ () (Q\ An) =liminf(Q\A,). (5.9) 


n=] m=n 


The following characterisation of convergence almost surely is a simple 
application of the definitions and (5.9). 


160 Sequences of random variables 


Proposition 5.19 


Given an € > 0 and random variables X,,X>,... and X, write Ans = 
{|X,, — X| > e} for each n = 1,2,... . Then the following conditions are 
equivalent: 

O X, SX; 


Gi) P (lim sup, ,.. Ans) = 0 for every e > 0. 


Proof Suppose (i) holds. Write Y,, = |X, — X|. For each w € Q the state- 
ment Y,(w) — O means that for every e > O we can find n = n(e,w) 
such that |¥,(w)| < e for every k > n. If Y, => 0, then such n can be 
found for each w from a set of probability 1. Hence for any fixed £ > 0 
we have P(U, i, Bee) = 1, where Bee = {Y| < £}. By definition, 
Ui en Bre = liminfyso Bre. But Bee = Q \ Ace, so by (5.9) we have 
Q \ (im sup,,_,.o Ane) = liminf, 0 By. Since P(lim inf; 5. Bie) = 1, we 
have P(lim sup,,_,,, An) = 0. As £ > O was arbitrary, (ii) is proved. 
That (ii) implies (i) follows immediately by reversing the above steps. 
o 


In many applications, convergence almost surely of a given sequence of 
random variables is difficult to verify. However, observe that for fixed £ > 0 
and with A, as in Proposition 5.19, the sets Cne = Ugn Ake decrease as n 


increases. So if X, Sx , then for all € > 0, 
lim lV A. = lim P(C, s) = lf cu = P (iim sup Ane) =0. 
aa k=n aa n=1 ne 


Replacing U, Axe by the smaller set A,,. provides the following weaker 
mode of convergence, which is often easier to verify in practice. 


Definition 5.20 
We say that a sequence of random variables X1, X2,... converges to X in 


probability, and write X,, 5 X, if for each € > 0 
P(X,- X|>£&—0 asn> œ. 


Proposition 5.21 
IfX, Š X, then X, > X. 


Proof Itis evident that convergence in probability is weaker than conver- 
gence almost surely since Ane C US, Ake- o 


5.2 Modes of convergence for random variables 161 


Example 5.15 shows that it is strictly weaker because the X,, satisfy 
P(|X,,| > 0) —> O (hence X, 5, 0), but they fail to converge to 0 almost 
surely. 

Comparison of convergence in probability with convergence in L?-norm 
or L'-norm is established in the next proposition. 


Proposition 5.22 
1 2 
If X, > X or X, >X, then X, > X. 


Proof Write Y, = |X, — XI|. If E(Y,,) — 0, then eP(Y, > £) < E(Y,,) > 0 as 
n — œ for each fixed £ > 0. So convergence in L'-norm implies that X, = 
X. Because convergence in L?-norm implies convergence in L!-norm, it 


therefore also implies convergence in probability. Oo 


The converse of Proposition 5.22 is false, in general. The sequence of 
random variables defined in Example 5.16 converges to 0 in probability 
since P(X, > 0) = L, but we know that it does not converge to 0 in L!- 
norm or in L?-norm. 


Borel—Cantelli lemmas 
The following provides a simple method of checking when lim sup,,_,.. An 


is a set of P measure 0. 


Lemma 5.23 (first Borel—Cantelli lemma) 
Tf Xi, P(An) < œ, then 


P (im sup A,] = (0; 
Proof First note that lim sup, _,,, An C U? z An, hence for all k 


P (im sup A,] <P Ù 4, : 


n=% 
n=k 


By subadditivity we have P (U, An) < Xi, P(An) > 0 as k > œ since 
Ło- P(An) < œ. This completes the proof. m 


Our first application of this lemma gives a partial converse of Proposi- 
tion 5.21. 


Theorem 5.24 
If X, > X, then there is a subsequence X;, such that X;, > X. 


162 Sequences of random variables 


Proof We build a sequence A, of sets encapsulating the undesirable be- 
haviour of X„, which from the point of view of convergence occurs when 
|X,, — X| > a for some real a. First take a = 1. Since convergence in proba- 
bility is given, P(|X,, — X| > 1) > 0 provides kı such that for all n > kı 


P(X, -X|>1) <1. 


Next for a = 5 we find ky > kı such that for all n > ky 


PIX, xis a- 
7 27g 


We continue this process, obtaining an increasing sequence of integers k, 
such that 


P(x -x|> 1) = 
i n) nè 


l 
An = fix, -x| > r. 
n 
© $ < oœ, So the 


The series >),.; P(A,) converges, being dominated by }p-1 = 
first Borel—Cantelli lemma yields that A = limsup,_,,, A, has probability 


Now put 


zero. By Proposition 5.19 this means that Xz, 25 X almost surely, since for 
any given € > 0 we can always find n > L, m| 


It is natural to ask what the counterpart to the first Borel—Cantelli lemma 
should be when the series >)", P(A,) diverges. The result we now derive 
lies a little deeper, and requires the A, to be independent events, whereas 
the first Borel—Cantelli lemma holds for any sequence of events, without 
restriction. Nonetheless, the two results together give us a typical 0-1 law, 
which says that, for a sequence of independent random variables. the prob- 
ability of ‘tail events’ (those that involve infinitely many events in the se- 
quence) is either 0 or 1, but never inbetween. 


Lemma 5.25 (second Borel—Cantelli lemma) 
If Aj, Az, ... is a sequence of independent events and >\., P(An) = œ, then 
P (lim sup,,_,., An) = 1. 


Proof To prove that P (QÈ: Ur An) = 1 note that the events U7, An 
decrease as k increases, hence 


= eÙ 4, = ae Üa) | 


n=k 


5.2 Modes of convergence for random variables 163 


Thus it is sufficient to show that P (UX, An) = 1 for each k = 1,2,.... 

Now consider (\""_,(Q \ A,) for a fixed m > k. By de Morgan’s laws we 
have Q \ (Ul, An) = (Vr, (Q \ A,). The events Q \ Ay, Q \ Ao,... are also 
independent, so for k = 1,2,... 


[je \ |= í P(Q\ A,) = Fu - P(A,)]. 


n=k n=k n=k 


For any x > 0 we know that 1 — x < e™ (consider the derivative of e™* 


x — 1), so that 


m m 
[JE - Pam s | [ete =e Beer, 


n=k n=k 


Now recall that we assume that the series )°, P(A,) diverges. Hence for 

any fixed k the partial sums X}, P(A,) diverge to co as m — oo. Thus as 

m — œ the right-hand side of the inequality becomes arbitrarily small. 
This proves that 


1-a(U 4,)= = [Aea] >0 asm> o. 


n=k n=k 


Finally, write By, = U7- An, which is an increasing sequence and its union 
. fee) co 
is Un) Bm = Un, An. Hence 


AU 4, = lim P(B,,) = 1. 


n=k 


Example 5.26 
The independence requirement limits applications of the second Borel- 
Cantelli lemma, but it cannot be dropped. Consider A € F with P(A) € 


(0,1), and let A, = A for all n = 1,2,... . Then 5 P(A,) = 00, but 
n=1 
/P((oransityo),_.,.. 4) = REN) I. 


Uniform integrability 


We have shown that convergence in probability is strictly the weakest of the 
four modes of convergence we have studied. To study situations where the 


164 Sequences of random variables 


implications can be reversed we consider sequences of random variables 
with an additional property. The next exercise motivates this property. 


Exercise 5.6 Let X be a random variable defined on a probability 
space (Q, F, P). Prove that X € L'(P) if and only if for any given 
£ > 0 we can find a K > 0 such that Tas |X| dP < e. 


We extend this condition from single random variables to families of 
random variables in L! (P). 


Definition 5.27 
S c L!(P) is a uniformly integrable family of random variables if for 
every € > 0 there is a K > 0 such that Wigs |X| dP < e for each X € S. 


A uniformly integrable family S of random variables is bounded in L!- 
norm since, taking £ = 1 in the definition, we can find K > 0 such that for 
all X eS 


XII - | |X| aps f X| dP< K+1. 
{XI<K} {IXI>K} 


The sequence X, = nlio, 1; discussed in Example 5.16 is not uniformly inte- 
grable. For any K > 0 and n > K we have Tera |X,| dP = nP([0, 17) zki 
On the other hand, ||X,||; = 1 for all n, so the sequence X,, X2,... is 
bounded in L'-norm. This shows that boundedness in the L!-norm does 
not imply uniform integrability of a family of random variables. However, 
the stronger condition of boundedness in L?-norm is sufficient. 


Proposition 5.28 
If a family S of random variables is bounded in L?-norm, then it is uni- 
formly integrable. 


Proof Given y > K > 0, we have y < a Use this with y = |X(w)| for 
every w such that |X(w)| > K. Then 


1 
f IX|dP < — IXP dP. 
ixXPK} K Juxpry 


If there is C > 0 such that ||X||, < C for all X € S, we have en IXP dP < 


IX l < C? for all X € S, so the right-hand side above can be made smaller 
than any given £ > 0 by taking K > ©. o 


5.2 Modes of convergence for random variables 165 


The next exercise exhibits a uniformly integrable sequence in L!(P) 
(compare this with the dominated convergence theorem). 


Exercise 5.7 Show that if X1, X2,... is a sequence of random vari- 
ables dominated by an integrable random variable Y > O (that is, 
|X,,| < Y, P-a.s. for all n), then the sequence is uniformly integrable. 


A particularly useful uniformly integrable family in L!(P) is the follow- 
ing. 


Example 5.29 
Suppose that X € L!(P) and that 


S ={E(X|G):G cF isao-field}. 
By Jensen’s inequality with ¢(x) = |x|, we have |E(X | O)| < E(|X||G), so 


KP(IE(X|G)| > K) < f EX |G)| dP 


{IE(X|@)|>K} 


= I EXI |G) dP 
(IE(X1@)>K) 


=| IXI dP <|IXIh 
{IE(X|G)|>K} 


since {|E(X|G)| > K} € G. It follows that 


PXI > K < St 


Moreover, to prove that S is uniformly integrable we only need to show 
that for every ¢ > 0 there is a K > O such that ee GERI IX|dP < e for 
each o-field G C F. Suppose that this is not the case. Then there would 
exist an € > 0 such that for each n = 1,2,... one could find a o-field 
Gn C F such that h |X| dP > £, where A, = {|E(X|G,,)| > 2”}. We know 
that P(A) = 2” XIla so >p- P(An) < co. By the first Borel-Cantelli 
lemma, P (N31 U?_,, An) = 0. As a result, by the dominated convergence 
theorem, 


0<e< i IX|dP < I Klu asde = i IXI lns, uza dP = 0, 
An Q Q 


which is a contradiction. 


166 Sequences of random variables 


In particular, this shows that for an integrable random variable X and 
a sequence of o-fields Fi. Fas.. € F, the sequence X, = E(X |F) is 
uniformly integrable. 


For uniformly integrable sequences, convergence in probability implies 
convergence in L!-norm. 


Theorem 5.30 , 
Let Yı, Y2,... be a uniformly integrable sequence such that Y, — 0. Then 
Yall, > 0 as n — o. 


Proof As the sequence is uniformly integrable, given ¢ > 0 we can find 
K > È such that Te Px [Yn|dP < §. Also, limyoo P(\Yn| > œ) = 0 for every 


a > Q since Y, 5, 0. So we can find N > 0 with P(IY,| > §) < 37 ifn 2 N. 
For any n = 1,2,... put 


E E 
An = {IY,| > K}, B, = {K > |¥,| > 3h Ca ={IY,| < 3h 


Then 
Wah =f wade + f mdp» | IY,1dP 
An By Cn 
E E\ €E E 
<- + KP (iY, > =) + P(ir, s =| 
3 |Yņl 3)4*3 A 3 
<eé 
for each n > N. Hence ||Y,||; > 0 as n > oo. o 


We note an immediate consequence. 


Corollary 5.31 
a.s. 1 
If X,, X2, ... is a uniformly integrable sequence and X, > X, then X, EA X. 


Example 5.32 
Suppose that X is square-integrable and Fi C Fo C --- C F is an increasing 
sequence of o-fields contained in F. Suppose further that X, 2 X, where 
1 
X, = E(X|F,) for each n = 1,2,... . Then X, > X. 
To see this, apply Jensen’s inequality with ¢(x) = x”, which shows that 


5.3 Sequences of i.i.d. random variables 167 


for ean 7 = tea a. 
x? = (EGF) = HOC), Pas, 
so that 
E(X;) < E(E(X’|F,,)) = EX’). 


Hence ||X,|l2 < ||X|l2 for all n = 1,2,..., so that the sequence X1, X2,... is 
bounded in L?-norm, hence it is uniformly integrable by Proposition 5.28. 
Corollary 5.31 now proves our claim. 


5.3 Sequences of i.i.d. random variables 


The limit behaviour of independent sequences is of particular interest when 
all the random variables in the sequence share the same distribution. 


Definition 5.33 

A sequence X1, X2,... of random variables on a probability space (Q, F , P) 
is identically distributed if Fy (x) = Fy,(x) (that is, P(X, < x) = P(X; < 
x)) forall n = 1,2,...and all x € R. If, in addition, the random variables are 
independent, we call it a sequence of independent identically distributed 
(i.i.d.) random variables. 


Consider the arithmetic averages 1 yi, X; for a sequence of i.i.d. ran- 
dom variables as n tends to oo. Since the X; share the same distribution, 
their expectations are the same, as are their variances. Convergence of these 
averages in L?-norm, and hence in probability, follows from the basic prop- 
erties of expectation and variance. 


Theorem 5.34 (weak law of large numbers) 
Let X\, X2,... be a sequence of i.i.d. random variables with finite expecta- 
. 5 r P 
tion m and variance o°. Then + X}; X; > m, and hence + X}; X; > mas 
n n 


n —> œ. 


Proof Let S, = Xı +-+- +X, for each n = 1,2,... . First note that 
E(S„) = nm for each n = 1,2,... by the linearity of expectation. Hence 
E(%) = m and, by the properties of variance for the sum of independent 


168 Sequences of random variables 


random variables, 


a(n) ) eE) = poms 


1 1 
s Var(X;) = = aera 0 


i=1 


as n — oo, This means that i5 n Z, m. Convergence in probability follows 
from Proposition 5.22. m 


The law of large numbers provides a mathematical statement of our intu- 
ition that the average value over a large number of independent realizations 
of a random variable X is likely to be close to its expectation E(X). 


Remark 5.35 

The weak law of large numbers can be strengthened considerably. Ac- 
cording to Kolmogorov’s strong law of large numbers, for a sequence 
Xı, X2, . . . of i.i.d. random variables the averages 1 iL, X; converge almost 
surely to m if the X, are integrable. We shall not prove this here,! but will 
focus instead on the Central Limit Theorem. 


Constructing an i.i.d. sequence with given distribution 


Our aim is to construct i.i.d. sequences of random variables with a given 
distribution. In applications of probability theory one often pays little atten- 
tion to the probability space on which such random variables are defined. 
The main interest is in the distribution of the random variables. In part, this 
also applies in financial modelling, but here the knowledge of a particular 
realisation of the sample space may in fact be useful for computer simula- 
tions. From this point of view the choice of Q = [0, 1] with Borel sets and 
Lebesgue measure plays a special role since random sampling in this space 
is provided in many standard computer packages. 

The simplest case, developed with binomial tree applications in mind, is 
a sequence of i.i.d. random variables X,, each taking just two values, 1 or 
O with equal probabilities. The good news is such a such a sequence can 
be built with Q = [0,1] as the domain, so that X, : [0,1] — R for each 
n=1,2,... . To this end, for each n = 1,2,... and for each w € [0,1] we 


' For details see M. Capiriski and E. Kopp, Measure, Integral and Probability, 2nd 
edition, Springer-Verlag 2004. 


5.3 Sequences of i.i.d. random variables 169 


put 


eae l 1 ifwe [o. 2)u[Z, 2)u---u [252 24), 

O otherwise. 
It is routine to check that these random variables are independent and have 
the desired distribution. 

In the construction of the Wiener process in [SCF] we need a sequence 
of i.i.d. random variables uniformly distributed on [0,1] and defined on 
Q = [0, 1] with Borel sets and Lebesgue measure. Such a sequence can be 
obtained as follows. 

(i) Set up an infinite matrix of independent random variables X;; on 

[0,1] so that m(X;; = 0) = m(X; = 1) = ; by relabelling the se- 
quence X,, constructed above in the following manner: 


Xi X2 X3 Xu +t X X% X X I 
Xa Xan Xz +>” X X% ff 
Xa Xz e’ =X% / Z 
Xa oe l Z 
7 
(ii) Define 


j=1 


for each i = 1,2,... . The series is convergent for each w since 
o<) <) 55l 
j 2j 
j=1 j=1 


It turns out that Z; is uniformly distributed on [0, 1], that is, Fz,(x) = x 
for each x € [0, 1]. Indeed, for any n the sequence X;;,..., Xin is equal to 


a specific n-element sequence of Os and 1s with probability 4, so the sum 


2 z is equal to £ with probability x foreach k = 0,1,...,2”—1. Given 
any x € [0,1], there are [2”x] + 1 numbers of the form Æ in the interval 


2n 
[0, x] (here [a] denotes the integer part of a), so 


‘ Xij [2"x] +1 
m| — <x|= >x asn— oo 
2 2n 
j=l 
Since A, = es žu < x} is a decreasing sequence of sets with NX; An = 


170 Sequences of random variables 


{Z; < x}, we therefore have 
Fz(x) = m(Z; < x) = lim m(A,) = x. 


Moreover, the Z; are independent. We verify that Z,, Z, are independent. 
Once this is done, routine induction will extend this to the finite collection 
Zi, Z2,..., Zy for any fixed N, which is all that we need. Note that Z, = 
Dei žy and Zm = Eei Zy are independent because so are the random 
variables X11, . . . , Xin, X21, - - - , X2m. This implies that for any A1, A2 € BCR) 


m (Zm € A1, Zm € A) =m (Zi € Aı)m (Zn € A2) . 


We can write the measure of each of these sets as an integral of the indicator 
function of that set. Now since Z,, — Z, and Z,. > Z almost surely, we 
have liz ea} > Liziea and 1yz,,c4,; > 1,z,c4,; almost surely as n — ov. It 
follows by dominated convergence (all indicator functions being bounded 
by 1) that 


m(Z, € A1, Z € A2) = lim m (Z, € Ay,Z,2 € Ar) 
= lim m(Z,, € Aı)m (Zm € A) 
m(Z, € Aı)m(Z € A2) 


for any A;, A, € B(R), proving that Z,, Z) are independent. 


Remark 5.36 

It is possible to associate a sequence of i.i.d. random variables Z1, Z2,... 
defined on [0,1] with an i.i.d. sequence Yj, Y2,... defined on the space 
Q = [0, 1]* consisting of all functions w : N — [0, 1] so that the Y, have 
the same distribution as the Z,. To this end we consider Z = (Z1, Z,...) as 
a function Z : [0,1] — [0,1]", and equip [0, 1]" with the o-field F con- 
sisting of all sets A C [0, 1] such that {Z € A} € B(R) and with probability 
measure P : F — [0,1] such that P(A) = m(Z € A). Then Y,(w) = w(n), 
mapping each w € [0,1]" into w(n), defines a sequence of i.i.d. random 
variables on [0, 1]" such that Py, = Pz, for each n € N. 


5.4 Convergence in distribution 


Let X1, X2,... and X be random variables. We introduce a further notion of 
convergence concerned with their distributions. 


5.4 Convergence in distribution 171 


Definition 5.37 
A sequence of random variables X,, X2,...is said to converge in distribu- 
tion (or in law) to a random variable X, written X, => X, if 


lim Fy (x) = F(x) 
at every continuity point x of Fy, that is, at every x € R such that 
lim Fx(y) = F(x). 


Let us compare this mode of convergence with those developed ear- 
lier. In fact, convergence in distribution is the weakest (that is, easiest to 
achieve) convergence notion we have so far encountered. 


Theorem 5.38 
If X, > X, then X, => X. 


Proof For any e>0 
{X < x- £} C {Xn < xX} U {Xn - X| > £}, 
{Xn < X} C {X < x+ 8} U {Xn - X| > £}, 
so 


Fyx(x- €) < Fy, (x) + P (X, - X| > £), 
Fx, (x) < Fx(x + £) + P (|X, — X| >). 


If X, = X, then P (|X, — X| > £) > 0 as n — ov. It follows that 


Fy(x— £) < liminf Fy, (x) < lim sup Fy, (x) < Fx(x + €). 


Suppose that x € R is a continuity point of Fy. Then F’y(x — £) > F(x) 
and Fy(x + €) > Fy(x) as € N 0. As a result, 

lim Fx, œ = Fx(x), 
proving that X, => X. o 


The converse of Theorem 5.38 is not true. In fact, convergence in prob- 
ability does not even make sense if X„ and X are defined on different prob- 
ability spaces, which is possible since the definition of convergence in dis- 
tribution makes no direct reference to the underlying probability space. 


172 Sequences of random variables 


Example 5.39 

Although lim, -,.. P (|X, — X| > £) = 0 in general makes no sense unless X, 
and X are defined on the same probability space, we can arrive at a converse 
of Theorem 5.38 in a very special (indeed, trivial) case. Suppose that X is 
constant, that is, X(w) = c for all w € Q. Its distribution function is 


0 forx<ce, 
Eo) { l to a8 = ©. 
Now we can show that if X, = > c and all the X, are defined on the same 
probability space, then X, 2, C 
To see this, fix £ > 0. We have 
P(\X, —c| a < P(X, <c-—6)+ P(X, >c+6) 
lf (C= @) ar ll = Pe (ar) 
> F(c-—e)+1-F.(c+e)=0 


as n > ow, 


Exercise 5.8 Suppose that |X, — Y,l 4 0 and Y, = Y. Show that 
X= Y, 


Exercise 5.9 Show that if X, => X, then —X, => —X. 


: P i 
Exercise 5.10 Show that if X, = > X and Y, —> c, where c is a 
constant, then X, + Y, => X +c. 


Since convergence in distribution is the weakest notion of convergence 
that we have defined, we may hope for convergence theorems that tell us 
much more about the distribution of the limit random variable than hith- 
erto. The most important limit theorem in probability theory, the Central 
Limit Theorem (CLT), does this for sequences of independent random vari- 
ables and highlights the importance of the normal distribution. This is very 
fortunate, since for normally distributed random variables we have a very 


5.4 Convergence in distribution 173 


simple test for independence: they are independent if and only if they are 
uncorrelated, see Exercise 3.38. 


Example 5.40 

We will be concerned solely with the CLT for i.i.d. sequences, although 
much more general results are known. The classical example of conver- 
gence in distribution describes how the distributions of a sequence of bino- 
mial random variables, suitably normalised, will approximate the standard 
normal distribution. 

We phrase this in terms of tossing a fair coin arbitrarily many times. 
After n tosses there are 2” possible outcomes, consisting of all possible n- 
tuples of H and T, where H stands ‘heads’ and T for ‘tails’. We denote 
the set of all such outcomes by Q,. We assume that at each toss H and 
T are equally likely and that successive tosses are independent. By this 
we mean that the random variables X;,...,X, defined on n-tuples w = 
(w1, W2,..-, Wn) in Q, by setting 


fl THE aay = SL, AR 
Kw) ={ o forni = EN 


are independent. Let P,, denote the counting measure on all subsets of Q,, 
that is, P,(A) = a, where |A| denotes the number of n-tuples belonging to 
A Cc Q,. The sum S, = >, X; (which counts the number of ‘heads’ in 
n tosses) has the binomial distribution with parameters n and p = i; see 
Example 2.2. 

We have E(X;) = 5 and Var(X;) = { for all i = 1,2,...,, which implies 
that the proportion of ‘heads’ +S, has expectation + and variance i. The 
weak law of large numbers (see Theorem 5.34) implies that 15 n converges 
to 5 in probability, i.e. for each £ > 0 


Sa Il 
2 <2) =1. 


In other words, given ¢ > 0, the fraction of n-tuples for which the propor- 
tion of ‘heads’ in 7 tosses of a fair coin differs from ; by at most € increases 
with n, reaching 1 in the limit as n — ov. This supports our belief that for 
large n, a sequence of n tosses of a fair coin will, in most instances, yield 
approximately 5 ‘heads’. 

However, this leaves open the question of the limiting distribution of the 
number of ‘heads’. The answer will be given by the simplest (and oldest) 


174 Sequences of random variables 


form of the CLT, known as the de Moivre—Laplace theorem, see Corol- 
lary 5.53 below. 


5.5 Characteristic functions and inversion formula 


We revisit characteristic functions, which were introduced in Section 2.4, 
as these provide the key to finding limit distributions. We begin with a 
result showing that the distribution of a random variable is determined by 
its characteristic function. 


Theorem 5.41 (inversion formula) 
If the distribution function Fy of a random variable X is continuous at 
a,b € R, then 

>œ 2 


: 1 evita _ eth 
Fy(b) — F x(a) = jim — J, ~p AA dm(t). (5.10) 


Random variables X and Y have the same distribution if and only if they 
have the same characteristic function. 


Proof For anya < b 


1 evita = ev ith 
t) dt 
2r E it Ox ) 


1 -ita _ ,-itb m 
a a c. | f e"! ars) dm(t). 
Qn [-T,7] it R 


b 
f e™dx 
a 


is integrable over R x [-T, T] with respect to the product measure Px ® m, 
Fubini’s theorem gives 


1 -ita _ „itb 1 —ita _ „itb ; 
= eae { | f E S on dmo) dP 
27 Ji-T,T] it 27 Jr \ JETT] it 


= z [Hes Daro, 
2n R 


Since 


; <b-a 
it 


5.5 Characteristic functions and inversion formula 175 


where 


1 =ita — -itb | 
(t,x) = — f E TE an dm(t) 
FET] 


27 it 
1 v . _ _ . = 
z { sin f(x — a) — sin t(x — b) di 
Qn =F t 
1 r —a)- — 
és f cos t(x — a) cos t(x — b) ae 
2m J_r it 


The last integral is equal to 0 because the integrand is an odd function. 
Substituting y = t(x — a) and z = t(x — b), we obtain 


1 T-a) gin 1 TO-b) sinz 
poems i 


I(x, T) = — - — 
i i 2r J_t(x-a) Y j 2m Jt») Z 


It is shown in Exercise 5.11 below that 


$ sin 
{ = dy 1 
r o yY 


as s > œ and r > —oo. Thus, 


0 ifx<aorx>b, 


iman] 1 ifa<x<b. 


By dominated convergence, see Exercise 1.36, we have 


1 evita =. eË} 
lim = | ——— x(t) dm(t) = lim [1 T) dPx(x) 
T>% 27 Ji_-rry it To JR 
= [teas dP x(x) 
R 
= Px((a, b)) 


= Fx(b) — Fx(a), 


if a, b are continuity points of Fy, so that Py({a}) = Px({b}) = 0. 

Finally, we show that (5.10) implies uniqueness. Suppose that X and Y 
have the same characteristic function, dy = ġy. If a,b € R are continuity 
points of Fy, we have by (5.10) 


Fx(b) — Fx(a) = Fy(b) — Fy(a). 


It follows that the continuity points of Fx and of Fy coincide, and by letting 
a — —œ we obtain Fx(b) = Fy(b) at all continuity points b of Fy and Fy. 
By right-continuity and the fact that the points where continuity fails form 
an at most countable set, we obtain Fy = Fy. m 


176 Sequences of random variables 


Exercise 5.11 Show that 


: T sin x 
lim —dx = 
T=>œ 0 x 


T 
7" 


Exercise 5.12 Show that if fo léx(ldt < oo, then X has a density 
given by 


1 va 
fx(x) = 7z [eo dm(t). 
T JR 


Exercise 5.13 Suppose that X is an integer-valued random variable. 
Show that for each integer n 


1 2r ; 
PX =n} = = | e" by(t)dt. 


5.6 Limit theorems for weak convergence 


It is useful to consider convergence in distribution in terms of probabil- 
ity measures defined on the o-field (R) of Borel sets. We called such 
measures probability distributions in Definition 2.1. A probability distri- 
bution P uniquely determines a distribution function F : R —>[0,1] by 
F(x) = P((—œ, x]) for each x € R. Conversely, if two distribution func- 
tions agree, then the corresponding probability measures agree on the col- 
lection of all intervals of the form (—co, x], where x € R, and this col- 
lection is closed under intersection and generates the o-field BCR), so by 
Lemma 3.58 these measures agree on B(R). Thus there is a one-to-one cor- 
respondence between distribution functions and probability distributions. 


Definition 5.42 
Given probability measures P,, and P defined on (R), we say that P, con- 
verge weakly to P and write P, => P if 


lim P,,((—09, x]) = P((—09, x]) 


5.6 Limit theorems for weak convergence 177 


for each x € R such that P({x}) = 0. 


Observe that if P,, and P are the probability distributions of some random 
variables X, and X, then P, => P is equivalent to X, => X. That is, in this 
case weak convergence is the same as convergence in distribution. 


Theorem 5.43 (Skorohod representation) 

Suppose that P,, and P are probability measures defined on B(R) such that 
P, => P. Then there are random variables X, and X on the probability 
space Q = (0,1) (with Borel sets and Lebesgue measure) such that Px, = 
Pa Px = P and limp. X, (w) = X(w) for each w € (0, 1). 


Proof Let F(x) = P((-0o, x]) and F,,(x) = P„((—2, x]) for each x € R. 
We put 
Y(w) = inf{x€ R: w< F(x), 
Y,(w) = inf{x E R : w < F,(x)} 
for each w € (0, 1). It follows that 
m({w € (0,1) : Y(w) < x} = m({w € (0,1) : w < F(X)}) = F(x), 


so F is the distribution function of Y. Moreover, F, is the distribution func- 
tion of Y, by a similar argument. 

Now take any w € (0,1) and any € > 0, ņ > 0. Let x,y be continuity 
points of F such that 


Y(w)—- e< x< Y(w)<y<Ylw+n) +e. 


Then 
F(x)<w<wowt+n< FO). 


Since limyso F(x) = F(x) and limy+. Fa) = F(y), we have F,(x) < 
w < F,,(y), so 


Y(w)-E<x<Y,(w) <y< Ywt+nt+e 


for any sufficiently large n. It follows that lim,_,.. Y,(w) = Y(w) whenever 
Y is continuous at w. 

We put X,(w) = Y,(w) and X(w) = Y(w) at any continuity point w 
of Y and X,(w) = X(w) = 0 at any discontinuity point w of Y. Then 
lim, X,(w) = X(w) for every w € (0,1). The distributions of X, and X 
are the same as those of Y„ and Y, respectively, since the Xs differ from 
the corresponding Ys only on the set of discontinuity points of the non- 
decreasing function Y, which is at most countable and hence of Lebesgue 
measure 0. oO 


178 Sequences of random variables 


Corollary 5.44 
If Py, converges weakly to Py, then the characteristic functions of X,, and X 
Satisfy liMy oo bx, (t) = x(t) for each t. 


Proof Take the Skorohod representation Y„, Y of the measures Px,, Py. 
Pointwise convergence of Y, to Y implies that øy, (t) = E(e"™) > y(t) = 
Ee’) as n > œ by the dominated convergence theorem. But the distribu- 
tions of X,,, X are the same as those of Y,,, Y, so the characteristic functions 
are the same. o 


The following result has many varied applications in analysis and prob- 
ability. 


Theorem 5.45 (Helly selection) 

Let F,, F2,... be a sequence of distribution functions of probability mea- 
sures. Then there exists a subsequence Fn, F,,,,... and a non-decreasing 
right-continuous function F such that limyso. Fn, (x) = F(x) at each conti- 
nuity point x of F. 


Proof Let qi,q2,... be a sequence consisting of all rational numbers. 
Because the distribution functions have values in [0, 1], there is a subse- 
quence n!,n},... of the sequence 1,2,... such that the limit 


limp 50. F ni (qı) = G(qı) exists. Moreover, there is a subsequence n, ni, ies 
of the sequence nj,n},... such that limo F „2(q2) = G(q2) exists, and so 
on. Taking n; = nk for k = 1,2,..., we then have 


lim F,,(q) = Gq) 
for every i = 1,2,... . The functions G and 


F(x) = inf{G(q): x<q,qEeQ 


are non-decreasing since so are the F,,. For each x € R and e > O there 
is ag € Q such that x < q and G(q) < F(x) + e. If x < y < q, then 
F(y) < G(q) < F(x) + £. Hence F is right-continuous. 

If F is continuous at x, we take y < x such that F(x) — e < F(y). We 
also take q,r € Q such that y < q < x < rand G(r) < F(x) + £e. Since 
F,(q) < F, (x) < F,(r), it follows that 


F(x) -e < FO) < Gg) = lim F,,(q) < lim inf Fa, (2) 


< lim sup F,,(x) < lim Fn (r) = G(r) < F(x) + €. 
k>% (00 
Because this holds for any € > 0, we can conclude that limy,.. Fn, (x) = 
F(x). Oo 


5.6 Limit theorems for weak convergence 179 


Example 5.46 
The F in Helly’s theorem does not need to be a distribution function. For 
instance, if F, = 1},,0), then lim,_,.. F,(x) = 0 for each x € R. 


In view of this example, we introduce a condition which ensures that a 
probability distribution is obtained in the limit. 


Definition 5.47 

A sequence of probability measures P1, P2,... defined on A(R) is said to 
be tight if for each € > O there exists a finite interval [—a,a] such that 
P,({-a,a]) > 1 — e for all n =1,2,.... 


Example 5.48 
If P,, = 6, is the unit mass atn = 1,2,..., then Pi. Po... is not a tight 
sequence. 


Theorem 5.49 (Prokhorov) 
If a sequence P, P,... of probability measures on B(R) is tight, then it 
has a subsequence converging weakly to a probability measure P on B(R). 


Proof Let F,,(x) = P,((—09, x]) for each x € R. By Helly’s theorem, 
there is a subsequence F’,, converging to a non-decreasing right-continuous 
function F at each continuity point of F. 

We claim that lim. F(y) = 1. Take any € > 0. Tightness ensures that 
there is an a > 0 such that P([—a, a]) > 1 — €. Then for any continuity point 
y of F such that y > a we have 


F,(y) = P,(-, y]) > 1l-e foralln =1,2,.... 


Hence, F(y) = limy 0 Fa (y) = 1 — e. Because 1 > F(y) > 1 — e for each 
€ > 0, this proves that lim,_,.. F(y) = 1. It follows that F is a distribution 
function, and the corresponding probability measure P on B(R) satisfies 
P => P: o 


180 Sequences of random variables 
5.7 Central Limit Theorem 


Characteristic functions provide a powerful means of studying the distribu- 
tions of sums of independent random variables. Because of the following 
important theorem, characteristic functions can be used to study limit dis- 
tributions. 


Theorem 5.50 (continuity theorem) 
Let X,, X,...and X be random variables such that ġx, (t) > x(t) for each 
teR. Then Px, => Py. 


Proof First we show that the sequence Py,, Py,,... is tight. For any a > 0 


Px, ((-2/a,2/a]) = 1 = Px, (x€ R: |x| > 2/a}) 
21-2 f (1- a) aren 
{xeR:|x|>2/a} a |x| 


21-2 f (1-2) ar.on 
R ax 


-2 | Zar, (x) - 1. 
R ax 


Using Fubini’s theorem, we get 


[=> CAE jz = {(f a am) dPy,(x) 
i ax 2a R [-a,a] 


1 : 

= — (f e™ as.) dm(t) 
2a {[-a,a] \UR 
1 


— dy, (t) dm(t). 

~ 2a [-a,a] 
Since ¢y is continuous at 0 and ¢y(0) = 1, for any € > 0 there is an a > 0 
such that 
x(t) dm(t) — 1 


<E. 


2a [-a,a] 


Furthermore, since ¢y (tf) converges to x(t) for each t, the dominated con- 
vergence theorem (Theorem 1.43) implies that there exists an integer N 
such that 


1 


— <2 
2a re 


bx, (t) dm(t) — 1 
[-a,a] 


5.7 Central Limit Theorem 181 


for all n > N. It follows that there is an a > 0 such that 
1 
Py, ([-2/a, 2/a]) 2 al x, (t) dm(t) - 1 = 1 -4e 
[-a,a] 


for each n > N. We can ensure by taking a smaller a that this inequality 
holds for each n, which proves that the sequence Px,, Px,,... is tight. 
Now suppose that Py, does not converge weakly to Py. It means that 
Fy,(x) does not converge to Fy(x) at some continuity point x € R of Fy. 
It follows that there exist an 7 > O and a subsequence n,n, ... of the 
sequence 1,2,... such that 
[Fx (x) — F| > forallk =1,2,.... (5.11) 


Nk 


The subsequence Pps P Xn ... is tight because Py,, Px,,... is tight. Ac- 
cording to Prokhorov’s theorem, there is a subsequence mı, m, ... of the 
sequence 71, /2,... such that Py,, converges weakly to the probability dis- 
tribution Py of some random variable Y. By Corollary 5.44, dx,,() > 
dy(t). On the other hand, bx, (1) > Ox(t) for each t € R, so this implies 
dy = dy. By Theorem 5.41, Py and Py must coincide. This shows that 


Px, = Px, contradicting (5.11). m 


mg 


We conclude with a famous version of the Central Limit Theorem (CLT). 
We will concentrate on i.i.d. sequences, rather than seek to find the most 
general results. First we need some elementary inequalities. 


Lemma 5.51 
The following inequalities hold. 
G) For any complex numbers z,w such that |z| < 1 and |w| < 1 and for 
anyn=1,2,... 


Ic” —w"| < niz- wl. 


Gi) ForanyxeR 


P 1 
e-l- ix| < 5 x]? 
Gii) ForanyxeR 


EE (ix) 
61 Sk 
2 


1 
< min (e z w) l 


Proof (i) Since 


182 Sequences of random variables 
it follows that 


le" — w" < (Ie! + fel? wl + -+ + lel bw? + wi~) [z — wi 


<nj|z- vl. 
(ii) We have 
e*-l-ix= fo — x)e"ds, 
Estimating the integral gives 


eal] ix| = f (s— x)eds 
0 


1 
< — |x? , 
2 


Gii) We have 


r A2 X 
e*-—-l-ix-— G = =f (s—xye'ds. 
21 0 


2 
1 x , 
F Í (s — x)e*ds 


Estimating the integral gives 


e*—1 


(ix) ae 
<= i 
2 SaM 


Moreover, from (ii) 


saI -A2 
, ; 1 1 
e*—l-ix oe <le*-1 ix| + n < 5 lx] + 5 |" | = |x, 
which completes the proof. Oo 


Take a sequence of i.i.d. random variables X1, X2,... with finite mean 
m = E(X;) and variance go? = Var(X). Let S, = X; +--- + X, and write 


Se= 
720 
ovyn 


All T, have expectation 0 and variance 1. 


Theorem 5.52 (Central Limit Theorem) 

Let X,, be independent identically distributed random variables with finite 
expectation and variance. Then T, = T, where T has the standard nor- 
mal distribution N(O, 1). 


X,-m 


Proof Replacing X; by shows that there is no loss of generality in 
assuming m = E(X;,) = 0 and o° = Var(X;) = 1. Let ¢ denote the character- 
istic function of X; (the same for each k = 1,2,...). By Lemma 5.51 (iii), 


5.7 Central Limit Theorem 183 


for any t € R 
‘a i , (itX 
o@-(1-5] = Bel 1x; - J 
tX) 
< efle sjenre ) 
2 
2 1 3 
< E (|rX;| Lipata) + zE (xıl Lipa) 


1 
< PE (IX? Upx psy-1y) + zl? (5.12) 


Moreover, using Lemma 5.51 (ii) with x = we find that for any t € R 


P 2 e 2 lli 212 4 
e#-(1-5}]- Fait <5/F =4. (5.13) 
Since 
t 
t) = "— ’ 
or, (t) $ (=| 
by Lemma 5.51 (i), 
-È n t -5 i 
lør- e? -|y (==}-(¢ j 
< o| t ) _2 
<n|o\|—=]-e? 
vn 
< : 1 3 + oh 1 
<n Q a on nie om > 
and from (5.12), (5.13) we get 
2 2 R 1a 
n (=) - (1 - =] < n—B(IX1I Vpxtsi-t2n}) tnl) 
, g le? 
= PE (KP Upes-i2ny) + zra 


>0 asn—> œ 


and 


(4) i 
<n = >0 asnoo. 
8n 


This shows that lim„—œ r, (t) = es for each t € R. By the continuity theo- 


184 Sequences of random variables 


rem this means that T, == T, where T has the standard normal distribution 
N(O, 1). Oo 


The following result, which justifies our claims for the limiting behaviour 
of binomial distributions in Example 1.23 can be deduced from the Central 
Limit Theorem. 


Corollary 5.53 (de Moivre—Laplace theorem) 
Let X,, X2,... be i.i.d. random variables with P(X, = 1) = P(X, = 0) = 4 
for eachn = 1,2,... . Then for eacha < b 


S,—n/2 1 Do ties 
Pla < ual < ») > f e2 dx asn —> œ. 
yn/2 V2 a 


Proof Note that the expectation and variance of X,, are 


1 1 
EQ) = 5+ Var) = 7 


and apply the Central Limit Theorem, observing that 


Exercise 5.14 Use the de Moivre—Laplace theorem to estimate the 
probability that the number of ‘heads’ obtained in n = 10000 tosses 
of a fair coin lies in (a, b) = (4900, 5100). 


Example 5.54 

In Example 1.6 stock prices were modelled by n = 20 equally likely ad- 
ditive up/down jumps of 0.50 from an initial price 10. This gives the price 
after n such jumps as 

yn 


a eer 


Yn = 1040.5 92X -1)=10+S,-7 > 


Iei 
where Xir X2,... are i.i.d. random variables with P(X, = 1) = P(X, SOS 
1 


7 . 
Since E(X,,) = $ and Var(X,„) = ;, by the CLT we have T, => T, where 
T has the standard normal distribution N(0, 1). It means that the distribution 


5.7 Central Limit Theorem 185 


of Y,, can be approximated by the normal distribution N(u, a°) with u = 10 
and o = = = 2.236 when n = 20, as in Example 1.23. 


Example 5.55 
Consider a sequence of i.i.d. random variables K,, K2 


29 


... With distribution 
1 
IRUK, =) S AUK, =a) = 5 fonn = looses 


where —1 < d < u. Ina binomial model with multiplicative jumps the stock 
prices at time step n are given by 


S(n) = S(O) + Ki) X--- x (1 + Ky) 
with S(O) > 0 being the initial stock price (the spot price); see Example 1.7, 
where 
H=005, @=—005, SM =il0, m=20, (5.14) 


We want to understand the limiting distribution of S ,„ for large n. To this 
end, take 


X, = m + K;,) 
for n = 1,2,..., which form an i.i.d. sequence of random variables. This 
gives 
S(n) = Se, 
Suppose that 
a 
E(X,) = Z, Var(X,) = — 
n n 


for some parameters m and o > O (which can be expressed in terms of 
u, d). Then, according to the CLT, for 


2 Da XG = i 
= T 
we have T, = > T, where T has the standard normal distribution N(0, 1). 


As a result, X/_; X; => X, where X has the normal distribution N(m, o°) 
with mean m and variance o”, and this implies that 


Tp 


S@) SS. 


186 Sequences of random variables 


where In S has the normal distribution N(u, 07) with u = In S (0) + m. 

In other words, the distribution of S is log-normal with parameters u, o, 
see Example 1.24. The numerical values u = 2.2776 and o = 0.2238 in 
that example have been computed from u, d, S (0), n in (5.14). 


Index 


additivity, 2 
antiderivative, 15 
atom, 110 


binomial tree model, 56 


contingent claim, 46 
convergence 

almost surely, 157 

in distribution, 171 

in L'-norm, 157 

in L?-norm, 156 

in probability, 160 

weak, 176 
convolution, 86 
correlation coefficient, 96 
countable additivity, 6, 7 
covariance, 96 
covariance matrix, 98 


d-system, 99 
density 
bivariate normal, 75, 85 
conditional, 127 
Gaussian, 16 
joint, 75, 82 
log-normal, 17 
marginal, 80 
multivariate normal, 83 
normal, 16 
of a random variable, 50 
probability, 42 
derivative security, 46 
distribution 
binomial, 40 
continuous, 42 
discrete, 41 
exponential, 43 
function, 40, 49 
geometric, 50 
joint, 73, 82 
log-normal, 43, 52 
marginal, 74, 82 
negative binomial, 50 
normal, 42, 50 
of a random variable, 48 


Poisson, 41 
probability, 40 
standard normal, 51 


event, 6 

independence of, 87, 89 
expectation, 57 

conditional, 112, 113, 120 


Fourier coefficients, 150 
Fourier representation, 150 
function 
Borel measurable, 28 
characteristic, 63 
convex, 128 
Haar, 151 
indicator, 18 
integrable, 22, 24 
Lebesgue integrable, 28 
measurable, 20 
norm continuous, 133 
simple, 18 
inequality 
Bessel, 149 
Chebyshev, 62 
Jensen, 129 
Markov, 61 
Schwarz, 97 
inner product, 131 
integral, 21, 24 
Lebesgue, 28 
Riemann, 14 
inverse image, 19 


L'-norm, 138 

L?-norm, 132 

law of large numbers, 167 

lemma 
Borel—Cantelli, first, 161 
Borel—Cantelli, second, 162 
Fatou, 30 


measure, 10 
absolutely continuous, 140 
change of, 26 
counting, 11 


187 


188 Index 


Dirac, 9 independence of, 84, 86 
finite, 10 jointly continuous, 75, 82 
Lebesgue, on R, 12 orthogonal, 135 
Lebesgue, on R?, 72 square-integrable, 97 
Lebesgue, on R”, 73 uncorrelated, 96 
outer, 33 uniformly integrable family of, 164 
probability, 7 random vector, 73, 81 
product, 71 Gaussian, 83 
restriction of, 71 independence of, 91 
o-finite, 71 set 
space, 10 Borel, 11, 68, 81 
tight sequence of, 179 Cantor, 13 
unit mass, 9 closed, 134 
moment, 62 complete orthonormal, 150 
central, 62 orthonormal, 148 
moment generating function, 65 o-field, 6 
Monte Carlo simulation, 53 generated by a family of sets, 69 
nearest point, 136 generated by a random variable, 47 


independence of, 90, 92 


oe product of, 68 


arithmetic Asian call, 56 


bottom straddle, 48 =P 

E5438 
bull spread, 48 2 

E131 
butterfly spread, 48 

measure, 10 
call, 47 sas 
E 46 probability, 6 

uropean, sample, 2 


path-dependent, 56 

put, 47 

strangle, 48 
orthogonal projection, 137 
orthonormal basis, 150 


standard deviation, 60 


theorem 
Central Limit Theorem (CLT), 182 
continuity, 180 
de Moivre—Laplace, 184 


parallelogram law, 135 dominated convergence, 31 


Parseval identity, 151 Fubini, 77 

partition, 109 Helly selection, 178 

probability monotone convergence, 22 
binomial, 3,9 Prokhorov, 179 
conditional, 88 Pythagoras, 135 
density, 42 Radon-Nikodym, 140, 145 


distribution, 40 
equivalent, 142 
geometric, 6 
measure, 7 
Poisson, 5 
space, 6 
uniform, 3, 13 
probability generating function, 65 


Radon—Nikodym derivative, 142 
random variable, 47 

continuous, 49 

discrete, 49 

i.i.d., 167 


Skorohod representation, 177 
tower property, 118 
translation invariance, 13, 29 


variance, 60 


